Patents by Inventor Ali Shafiee

Ali Shafiee has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SUPPORTING FLOATING POINT 16 (FP16) IN DOT PRODUCT ARCHITECTURE

Publication number: 20210319079

Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.

Type: Application

Filed: January 20, 2021

Publication date: October 14, 2021

Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
MIXED-PRECISION NEURAL PROCESSING UNIT (NPU) USING SPATIAL FUSION WITH LOAD BALANCING

Publication number: 20210312325

Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique.

Type: Application

Filed: June 10, 2020

Publication date: October 7, 2021

Inventors: Hamzah ABDELAZIZ, Joseph HASSOUN, Ali SHAFIEE ARDESTANI
LOW OVERHEAD IMPLEMENTATION OF WINOGRAD FOR CNN WITH 3x3, 1x3 AND 3x1 FILTERS ON WEIGHT STATION DOT-PRODUCT BASED CNN ACCELERATORS

Publication number: 20210294873

Abstract: A system and a method are disclosed for forming an output feature map (OFM). Activation values in an input feature map (IFM) are selected and transformed on-the-fly into the Winograd domain. Elements in a Winograd filter is selected that respectively correspond to the transformed activation values. A transformed activation value is multiplied by a corresponding element of the Winograd filter to form a corresponding product value in the Winograd domain. Activation values are repeatedly selected, transformed and multiplied by a corresponding element in the Winograd filter to form corresponding product values in the Winograd domain until all activation values in the IFM have been transformed and multiplied by the corresponding element. The product values are summed in the Winograd domain to form elements of a feature map in the Winograd domain. The elements of the feature map in the Winograd domain are inverse-Winograd transformed on-the-fly to form the OFM.

Type: Application

Filed: June 10, 2020

Publication date: September 23, 2021

Inventors: Ali SHAFIEE ARDESTANI, Joseph HASSOUN
Processing in-memory architectures for performing logical operations

Patent number: 11126549

Abstract: In an example, a method includes identifying, using at least one processor, data portions of a plurality of distinct data objects stored in at least one memory which are to be processed using the same logical operation. The method may further include identifying a representation of an operand stored in at least one memory, the operand being to provide the logical operation and providing a logical engine with the operand. The data portions may be stored in a plurality of input data buffers, wherein each of the input data buffers comprises a data portion of a different data object. The logical operation may be carried out on each of the data portions using the logical engine, and the outputs for each data portion may be stored in a plurality of output data buffers, wherein each of the outputs comprising data derived from a different data object.

Type: Grant

Filed: March 31, 2016

Date of Patent: September 21, 2021

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
ACCELERATING 2D CONVOLUTIONAL LAYER MAPPING ON A DOT PRODUCT ARCHITECTURE

Publication number: 20210182025

Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.

Type: Application

Filed: June 12, 2020

Publication date: June 17, 2021

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION

Publication number: 20210141603

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Type: Application

Filed: January 15, 2021

Publication date: May 13, 2021

Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Joseph HASSOUN, Lei WANG
PIECEWISE QUANTIZATION FOR NEURAL NETWORKS

Publication number: 20210133278

Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.

Type: Application

Filed: March 11, 2020

Publication date: May 6, 2021

Inventors: Jun FANG, Joseph H. HASSOUN, Ali SHAFIEE ARDESTANI, Hamzah Ahmed Ali ABDELAZIZ, Georgios GEORGIADIS, Hui CHEN, David Philip Lloyd THORSLEY
Signed multiplication using unsigned multiplier with dynamic fine-grained operand isolation

Patent number: 10963220

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Type: Grant

Filed: February 14, 2019

Date of Patent: March 30, 2021

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph Hassoun, Lei Wang
Data processing using resistive memory arrays

Patent number: 10942673

Abstract: In an example, a method includes receiving, in a memory, input data to be processed in a first and a second processing layer. A processing operation of the second layer may be carried out on an output of a processing operation of the first processing layer. The method may further include assigning the input data to be processed according to at least one processing operation of the first layer, which may comprise using a resistive memory array, and buffering output data. It may be determined whether the buffered output data exceeds a threshold data amount to carry out at least one processing operation of the second layer and when it is determined that the buffered output data exceeds the threshold data amount, at least a portion of the buffered output data may be assigned to be processed according to a processing operation of the second layer.

Type: Grant

Filed: March 31, 2016

Date of Patent: March 9, 2021

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Ali Shafiee Ardestani, Naveen Muralimanohar
MIXED-PRECISION NPU TILE WITH DEPTH-WISE CONVOLUTION

Publication number: 20200349420

Abstract: A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.

Type: Application

Filed: April 3, 2020

Publication date: November 5, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
Identifying outlying values in matrices

Patent number: 10754581

Abstract: In an example, a method comprises receiving a first matrix of values to be mapped to a resistive memory array, wherein each value in the matrix is to be represented as a resistance of a resistive memory element. An outlying value may be identified in the first matrix. At least one value of a portion of the first matrix containing the outlying value may be substituted with at least one substitute value to form a substituted first matrix.

Type: Grant

Filed: March 31, 2016

Date of Patent: August 25, 2020

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
Assigning data to a resistive memory array based on a significance level

Patent number: 10754582

Abstract: In an example, a method includes receiving input data and dividing the input data into a plurality of data portions, wherein the size of each data portion is based on a significance level. The input data may be assigned to at least one resistive memory array. Assigning the input data to at least one resistive memory array may comprises at least one of (i) assigning at least one data portion of the input data to be represented by a resistive memory array representing a number of bits, wherein the number of bits represented within the resistive memory array is based on the size of the at least one data portion; and (ii) processing each data portion of the input data with at least one resistive memory array.

Type: Grant

Filed: March 31, 2016

Date of Patent: August 25, 2020

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani, Ben Feinberg
Dot product engine with negation indicator

Patent number: 10664271

Abstract: Examples disclosed herein include a dot product engine, which includes a resistive memory array to receive an input vector, perform a dot product operation on the input vector and a stored vector stored in the memory array, and output an analog signal representing a result of the dot product operation. The dot product engine includes a stored negation indicator to indicate whether elements of the stored vector have been negated, and a digital circuit to generate a digital dot product result value based on the analog signal and the stored negation indicator.

Type: Grant

Filed: January 30, 2016

Date of Patent: May 26, 2020

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION

Publication number: 20200150924

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Type: Application

Filed: February 14, 2019

Publication date: May 14, 2020

Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Joseph HASSOUN, Lei WANG
NEURAL PROCESSOR

Publication number: 20200026978

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
NEURAL PROCESSOR

Publication number: 20200026979

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
NEURAL PROCESSOR

Publication number: 20200026980

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: August 27, 2019

Publication date: January 23, 2020

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
Signal conversion based on complimentary analog signal pairs

Patent number: 10529394

Abstract: Examples disclosed herein relate to a circuit having first and second analog processors and an analog-to-digital converter coupled to the first and second analog processors. The first analog processor provides a first analog signal having a voltage representing a function of a first vector and a second vector. The second analog processor provides a second analog signal having a voltage representing a function of a binary inverse of the first vector and the second vector. The analog-to-digital converter receives the first analog signal and the second analog signal, compares a signal selected from a group consisting of the first analog signal and the second analog signal to a reference voltage and based on the comparison to the reference voltage, determines a digital result representing the function of the first vector and the second vector.

Type: Grant

Filed: August 30, 2018

Date of Patent: January 7, 2020

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Ali Shafiee Ardestani, Naveen Muralimanohar, Brent Buchanan
NEURAL PROCESSOR

Publication number: 20190392287

Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.

Type: Application

Filed: June 19, 2019

Publication date: December 26, 2019

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
ASSIGNING DATA TO A RESISTIVE MEMORY ARRAY BASED ON A SIGNIFICANCE LEVEL

Publication number: 20190065118

Abstract: In an example, a method includes receiving input data and dividing the input data into a plurality of data portions, wherein the size of each data portion is based on a significance level. The input data may be assigned to at least one resistive memory array. Assigning the input data to at least one resistive memory array may comprises at least one of (i) assigning at least one data portion of the input data to be represented by a resistive memory array representing a number of bits, wherein the number of bits represented within the resistive memory array is based on the size of the at least one data portion; and (ii) processing each data portion of the input data with at least one resistive memory array.

Type: Application

Filed: March 31, 2016

Publication date: February 28, 2019

Inventors: Naveen MURALIMANOHAR, Ali SHAFIEE ARDESTANI, Ben FEINBERG

prev 1 2 3 4 5 next