Patents by Inventor Ali Shafiee

Ali Shafiee has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210319079
    Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.
    Type: Application
    Filed: January 20, 2021
    Publication date: October 14, 2021
    Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20210312325
    Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique.
    Type: Application
    Filed: June 10, 2020
    Publication date: October 7, 2021
    Inventors: Hamzah ABDELAZIZ, Joseph HASSOUN, Ali SHAFIEE ARDESTANI
  • Publication number: 20210294873
    Abstract: A system and a method are disclosed for forming an output feature map (OFM). Activation values in an input feature map (IFM) are selected and transformed on-the-fly into the Winograd domain. Elements in a Winograd filter is selected that respectively correspond to the transformed activation values. A transformed activation value is multiplied by a corresponding element of the Winograd filter to form a corresponding product value in the Winograd domain. Activation values are repeatedly selected, transformed and multiplied by a corresponding element in the Winograd filter to form corresponding product values in the Winograd domain until all activation values in the IFM have been transformed and multiplied by the corresponding element. The product values are summed in the Winograd domain to form elements of a feature map in the Winograd domain. The elements of the feature map in the Winograd domain are inverse-Winograd transformed on-the-fly to form the OFM.
    Type: Application
    Filed: June 10, 2020
    Publication date: September 23, 2021
    Inventors: Ali SHAFIEE ARDESTANI, Joseph HASSOUN
  • Patent number: 11126549
    Abstract: In an example, a method includes identifying, using at least one processor, data portions of a plurality of distinct data objects stored in at least one memory which are to be processed using the same logical operation. The method may further include identifying a representation of an operand stored in at least one memory, the operand being to provide the logical operation and providing a logical engine with the operand. The data portions may be stored in a plurality of input data buffers, wherein each of the input data buffers comprises a data portion of a different data object. The logical operation may be carried out on each of the data portions using the logical engine, and the outputs for each data portion may be stored in a plurality of output data buffers, wherein each of the outputs comprising data derived from a different data object.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: September 21, 2021
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
  • Publication number: 20210182025
    Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.
    Type: Application
    Filed: June 12, 2020
    Publication date: June 17, 2021
    Inventors: Ali Shafiee Ardestani, Joseph Hassoun
  • Publication number: 20210141603
    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.
    Type: Application
    Filed: January 15, 2021
    Publication date: May 13, 2021
    Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Joseph HASSOUN, Lei WANG
  • Publication number: 20210133278
    Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.
    Type: Application
    Filed: March 11, 2020
    Publication date: May 6, 2021
    Inventors: Jun FANG, Joseph H. HASSOUN, Ali SHAFIEE ARDESTANI, Hamzah Ahmed Ali ABDELAZIZ, Georgios GEORGIADIS, Hui CHEN, David Philip Lloyd THORSLEY
  • Patent number: 10963220
    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.
    Type: Grant
    Filed: February 14, 2019
    Date of Patent: March 30, 2021
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph Hassoun, Lei Wang
  • Patent number: 10942673
    Abstract: In an example, a method includes receiving, in a memory, input data to be processed in a first and a second processing layer. A processing operation of the second layer may be carried out on an output of a processing operation of the first processing layer. The method may further include assigning the input data to be processed according to at least one processing operation of the first layer, which may comprise using a resistive memory array, and buffering output data. It may be determined whether the buffered output data exceeds a threshold data amount to carry out at least one processing operation of the second layer and when it is determined that the buffered output data exceeds the threshold data amount, at least a portion of the buffered output data may be assigned to be processed according to a processing operation of the second layer.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: March 9, 2021
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Ali Shafiee Ardestani, Naveen Muralimanohar
  • Publication number: 20200349420
    Abstract: A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.
    Type: Application
    Filed: April 3, 2020
    Publication date: November 5, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
  • Patent number: 10754581
    Abstract: In an example, a method comprises receiving a first matrix of values to be mapped to a resistive memory array, wherein each value in the matrix is to be represented as a resistance of a resistive memory element. An outlying value may be identified in the first matrix. At least one value of a portion of the first matrix containing the outlying value may be substituted with at least one substitute value to form a substituted first matrix.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: August 25, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
  • Patent number: 10754582
    Abstract: In an example, a method includes receiving input data and dividing the input data into a plurality of data portions, wherein the size of each data portion is based on a significance level. The input data may be assigned to at least one resistive memory array. Assigning the input data to at least one resistive memory array may comprises at least one of (i) assigning at least one data portion of the input data to be represented by a resistive memory array representing a number of bits, wherein the number of bits represented within the resistive memory array is based on the size of the at least one data portion; and (ii) processing each data portion of the input data with at least one resistive memory array.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: August 25, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani, Ben Feinberg
  • Patent number: 10664271
    Abstract: Examples disclosed herein include a dot product engine, which includes a resistive memory array to receive an input vector, perform a dot product operation on the input vector and a stored vector stored in the memory array, and output an analog signal representing a result of the dot product operation. The dot product engine includes a stored negation indicator to indicate whether elements of the stored vector have been negated, and a digital circuit to generate a digital dot product result value based on the analog signal and the stored negation indicator.
    Type: Grant
    Filed: January 30, 2016
    Date of Patent: May 26, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
  • Publication number: 20200150924
    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.
    Type: Application
    Filed: February 14, 2019
    Publication date: May 14, 2020
    Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Joseph HASSOUN, Lei WANG
  • Publication number: 20200026978
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20200026979
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20200026980
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 10529394
    Abstract: Examples disclosed herein relate to a circuit having first and second analog processors and an analog-to-digital converter coupled to the first and second analog processors. The first analog processor provides a first analog signal having a voltage representing a function of a first vector and a second vector. The second analog processor provides a second analog signal having a voltage representing a function of a binary inverse of the first vector and the second vector. The analog-to-digital converter receives the first analog signal and the second analog signal, compares a signal selected from a group consisting of the first analog signal and the second analog signal to a reference voltage and based on the comparison to the reference voltage, determines a digital result representing the function of the first vector and the second vector.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: January 7, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Ali Shafiee Ardestani, Naveen Muralimanohar, Brent Buchanan
  • Publication number: 20190392287
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: June 19, 2019
    Publication date: December 26, 2019
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20190065118
    Abstract: In an example, a method includes receiving input data and dividing the input data into a plurality of data portions, wherein the size of each data portion is based on a significance level. The input data may be assigned to at least one resistive memory array. Assigning the input data to at least one resistive memory array may comprises at least one of (i) assigning at least one data portion of the input data to be represented by a resistive memory array representing a number of bits, wherein the number of bits represented within the resistive memory array is based on the size of the at least one data portion; and (ii) processing each data portion of the input data with at least one resistive memory array.
    Type: Application
    Filed: March 31, 2016
    Publication date: February 28, 2019
    Inventors: Naveen MURALIMANOHAR, Ali SHAFIEE ARDESTANI, Ben FEINBERG