Patents by Inventor Joseph H. Hassoun

Joseph H. Hassoun has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS

Publication number: 20230028226

Abstract: A method is disclosed to reduce computation in a self-attention deep-learning model. A feature-map regularization term is added to a loss function while training the self-attention model. At least one low-magnitude feature is removed from at least one feature map of the self-attention model during inference. Weights of the self-attention model are quantized after the self-attention model has been trained. Adding the feature-map regularization term reduces activation values of feature maps, and removing the at least one low-magnitude feature from at least one feature map may be performed by setting the low-magnitude feature to be equal to zero based on the low-magnitude feature having a value that is less than a predetermined threshold. Feature maps of the self-attention model quantized and compressed.

Type: Application

Filed: September 14, 2021

Publication date: January 26, 2023

Inventors: David Philip Lloyd THORSLEY, Joseph H. HASSOUN, Jun FANG, Chengyao SHEN
MIXED-PRECISION NEURAL NETWORK ACCELERATOR TILE WITH LATTICE FUSION

Publication number: 20220405559

Abstract: A neural network accelerator is disclosed that includes a multiplication unit, an adder-tree unit and an accumulator unit. The multiplication unit and the adder tree unit are configured to perform lattice-multiplication operations. The accumulator unit is coupled to an output of the adder tree to form dot-product values from the lattice-multiplication operations performed by the multiplication unit and the adder tree unit. The multiplication unit includes n multiplier units that perform lattice-multiplication-based operations and output product values. Each multiplier unit includes a plurality of multipliers. Each multiplier unit receives first and second multiplicands that each include a most significant nibble (MSN) and a least significant nibble (LSN). The multipliers in each multiplier unit receive different combinations of the MSNs and the LSNs of the multiplicands. The multiplication unit and the adder can provide mixed-precision dot-product computations.

Type: Application

Filed: August 31, 2021

Publication date: December 22, 2022

Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
SRAM-SHARING FOR RECONFIGURABLE NEURAL PROCESSING UNITS

Publication number: 20220405557

Abstract: A system and a method is disclosed for processing input feature map (IFM) data of a current layer of a neural network model using an array of reconfigurable neural processing units (NPUs) and storing output feature map (OFM) data of the next layer of the neural network model at a location that does not involve a data transfer between memories of the NPUs according to the subject matter disclosed herein. The reconfigurable NPUs may be used to improve NPU utilization of NPUs of a neural processing system.

Type: Application

Filed: August 11, 2021

Publication date: December 22, 2022

Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
DEPTHWISE-CONVOLUTION IMPLEMENTATION ON A NEURAL PROCESSING CORE

Publication number: 20220405558

Abstract: A core of neural processing units is configured to efficiently process a depthwise convolution by maximizing spatial feature-map locality using adder trees. Data paths of activations and weights are inverted, and 2-to-1 multiplexers are every 2/9 multipliers along a row of multipliers. During a depthwise convolution operation, the core is operated using a RS×HW dataflow to maximize the locality of feature maps. For a normal convolution operation, the data paths of activations and weights may be configured for a normal convolution configuration and in which multiplexers are idle.

Type: Application

Filed: August 12, 2021

Publication date: December 22, 2022

Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
WEIGHT-SPARSE NEURAL PROCESSING UNIT WITH MULTI-DIMENSIONAL ROUTING OF NON-ZERO VALUES

Publication number: 20220156569

Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, and a processing element (PE). The first buffer receives a elements of a matrix A of activation values. The second buffer receives b elements of a matrix B of weight values. The matrix B is preprocessed with a nonzero-valued b element replacing a zero-valued b element in a first row of the second buffer based on the zero-valued b element being in the first row of the second buffer. Metadata is generated that includes movement information of the nonzero-valued b element to replace the zero-valued b element. The PE receives b elements from a first row of the second buffer and a elements from the first buffer from locations in the first buffer that correspond to locations in the second buffer from where the b elements have been received by the PE as indicated by the metadata.

Type: Application

Filed: November 8, 2021

Publication date: May 19, 2022

Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
DUAL-SPARSE NEURAL PROCESSING UNIT WITH MULTI-DIMENSIONAL ROUTING OF NON-ZERO VALUES

Publication number: 20220156568

Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, a control logic circuit, and a first processing element (PE). The first buffer receives a elements of a first matrix A of activation values. The second buffer receives b elements of a second matrix B of weight values. The control logic circuit replaces a zero-valued a element in a first column of the first buffer with a nonzero-valued a element that is within a maximum borrowing distance of a location of the zero-valued a element in the first column of the first buffer. The PE receives a elements from the first column of the first buffer including the nonzero-valued element a selected to replace the zero-valued a element and receives b elements from locations in the second buffer that correspond to locations in the first buffer from where the a elements have been received by the PE.

Type: Application

Filed: November 8, 2021

Publication date: May 19, 2022

Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
PROCESSOR FOR FINE-GRAIN SPARSE INTEGER AND FLOATING-POINT OPERATIONS

Publication number: 20220147313

Abstract: A processor for fine-grain sparse integer and floating-point operations and method of operation thereof. In some embodiments, the method includes forming a first set of products, and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a second multiplier, and a third multiplier, the first activation value by a first least significant sub-word, a second least significant sub-word, and a most significant sub-word; and adding a first resulting partial product and a second resulting partial product. The forming of the second set of products may include forming a first floating point product, the forming of the first floating point product including multiplying, in the first multiplier, a first sub-word of a mantissa of an activation value by a first sub-word of a mantissa of a weight, to form a third partial product.

Type: Application

Filed: December 23, 2020

Publication date: May 12, 2022

Inventors: Ali Shafiee Ardestani, Joseph H. Hassoun
HIERARCHICAL WEIGHT PREPROCESSING FOR NEURAL NETWORK ACCELERATOR

Publication number: 20210357748

Abstract: A system and method for weight preprocessing. In some embodiments, the method includes performing intra-tile preprocessing of a first weight tensor to form a first pre-processed weight tensor, and performing inter-tile preprocessing of the first pre-processed weight tensor, to form a second pre-processed weight tensor. The intra-tile preprocessing may include moving a first element of a first weight tile of the first weight tensor by one position, within the first weight tile, in a lookahead direction or in a lookaside direction. The inter-tile preprocessing may include moving a first row of a weight tile of the first pre-processed weight tensor by one position in a lookahead direction or by one position in a lookaside direction.

Type: Application

Filed: August 11, 2020

Publication date: November 18, 2021

Inventors: Jong Hoon Shin, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
SUPPORTING FLOATING POINT 16 (FP16) IN DOT PRODUCT ARCHITECTURE

Publication number: 20210319079

Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.

Type: Application

Filed: January 20, 2021

Publication date: October 14, 2021

Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
PIECEWISE QUANTIZATION FOR NEURAL NETWORKS

Publication number: 20210133278

Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.

Type: Application

Filed: March 11, 2020

Publication date: May 6, 2021

Inventors: Jun FANG, Joseph H. HASSOUN, Ali SHAFIEE ARDESTANI, Hamzah Ahmed Ali ABDELAZIZ, Georgios GEORGIADIS, Hui CHEN, David Philip Lloyd THORSLEY
MIXED-PRECISION NPU TILE WITH DEPTH-WISE CONVOLUTION

Publication number: 20200349420

Abstract: A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

Type: Application

Filed: April 3, 2020

Publication date: November 5, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
HARDWARE CHANNEL-PARALLEL DATA COMPRESSION/DECOMPRESSION

Publication number: 20200336272

Abstract: A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2N channels of bit streams that each has the same bit-stream length.

Type: Application

Filed: April 7, 2020

Publication date: October 22, 2020

Inventors: Ilia OVSIANNIKOV, Lei WANG, Ali ARDESTANI SHAFIEE, Joseph H. HASSOUN
NEURAL PROCESSOR

Publication number: 20200026978

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
NEURAL PROCESSOR

Publication number: 20200026979

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
NEURAL PROCESSOR

Publication number: 20200026980

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
NEURAL PROCESSOR

Publication number: 20190392287

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: June 19, 2019

Publication date: December 26, 2019

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
Double data rate flip-flop

Patent number: 7317773

Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.

Type: Grant

Filed: July 9, 2004

Date of Patent: January 8, 2008

Assignee: Xilinx, Inc.

Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun
Double data rate flip-flop

Publication number: 20040239365

Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.

Type: Application

Filed: July 9, 2004

Publication date: December 2, 2004

Applicant: Xilinx, Inc.

Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun
Forming linked lists using content addressable memory

Patent number: 6820086

Abstract: A linked list structure in a computing system includes a first entry and additional entries. Each additional entry includes a link reference to a prior entry in the linked list. The link reference for each additional entry all are stored within a content addressable memory. Each additional entry is accessible by performing a content search using the link reference to the prior entry. The linked list is traversed by accessing the first entry in the linked list. A second entry in the linked list is accessed by searching the content addressable memory with an index of the first entry. A third entry in the linked list is accessed by searching the content addressable memory with an index of the second entry.

Type: Grant

Filed: June 18, 1999

Date of Patent: November 16, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Sorin Iacobovici, William R. Bryg, Joseph H. Hassoun
Double data rate flip-flop

Patent number: 6777980

Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.

Type: Grant

Filed: January 15, 2003

Date of Patent: August 17, 2004

Assignee: Xilinx, Inc.

Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun

prev 1 2 3 next