Patents by Inventor Paul Nicholas Whatmough

Paul Nicholas Whatmough has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11526305
    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: December 13, 2022
    Assignee: Arm Limited
    Inventors: Mudit Bhargava, Paul Nicholas Whatmough, Supreet Jeloka, Zhi-Gang Liu
  • Publication number: 20220382690
    Abstract: Various implementations described herein are directed to a device having a multi-layered logic structure with a first logic layer and a second logic layer arranged vertically in a stacked configuration. The device may have a memory array that provides data, and also, the device may have an inter-layer data bus that vertically couples the memory array to the multi-layered logic structure. The inter-layer data bus may provide multiple data paths to the first logic layer and the second logic layer for reuse of the data provided by the memory array.
    Type: Application
    Filed: May 31, 2021
    Publication date: December 1, 2022
    Inventors: Paul Nicholas Whatmough, Zhi-Gang Liu, Supreet Jeloka, Saurabh Pijuskumar Sinha, Matthew Mattina
  • Patent number: 11501151
    Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: November 15, 2022
    Assignee: Arm Limited
    Inventors: Paul Nicholas Whatmough, Zhi-Gang Liu, Matthew Mattina
  • Publication number: 20220351033
    Abstract: A method of operating a system having a plurality of neural networks includes receiving sequential input data events and processing each sequential input data event using a corresponding subset of the plurality of neural networks to obtain a plurality of sequential outputs. Each sequential output is indicative of a predictive determination of an aspect of the corresponding input data event. The method includes processing the plurality of sequential outputs to determine an uncertainty value associated with the plurality of sequential outputs, and operating the system based on the determined uncertainty value.
    Type: Application
    Filed: April 28, 2021
    Publication date: November 3, 2022
    Inventors: Paul Nicholas WHATMOUGH, Mark John O'CONNOR
  • Publication number: 20220351032
    Abstract: A compute-in-memory (CIM) array module and a method for performing dynamic saturation detection for a CIM array are provided. The CIM array module includes a CIM array, saturation detection units (SDUs) and a controller. The CIM array includes selectable row signal lines, column signal lines and cells. Each cell is located at an intersection of a selectable row signal line and a column signal line, and each cell has a programmable conductance. The SDUs are selectively coupled to at least one column signal line, and each SDU is configured to, for each column signal line, generate an analog signal, and identify the column signal line as a saturated column signal line when a voltage of the analog signal is greater than a saturation threshold voltage, or a current of the analog signal is greater than a saturation threshold current.
    Type: Application
    Filed: April 28, 2021
    Publication date: November 3, 2022
    Applicant: Arm Limited
    Inventors: Teyuh Alice Chou, Mudit Bhargava, Supreet Jeloka, Fernando Garcia Redondo, Paul Nicholas Whatmough
  • Patent number: 11392376
    Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.
    Type: Grant
    Filed: April 11, 2019
    Date of Patent: July 19, 2022
    Assignee: Arm Limited
    Inventors: Zhigang Liu, Matthew Mattina, Paul Nicholas Whatmough, Jesse Garrett Beu
  • Patent number: 11379556
    Abstract: There is provided a data processing apparatus to perform an operation on a first matrix and a second matrix. The data processing apparatus includes receiver circuitry to receive elements of the first matrix, elements of the second matrix, and correspondence data to indicate where the elements of the first matrix are located in the first matrix. Determination circuitry performs, using the correspondence data, a determination of whether, for a given element of the first matrix in column i of the first matrix, a given element of the second matrix occurs in row i of the second matrix. Aggregation circuitry calculates an aggregation between a given row in the first matrix and a given column in the second matrix and includes: functional circuitry to perform, in dependence on the determination, a function on the given element of the first matrix and the given element of the second matrix to produce a partial result.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: July 5, 2022
    Assignee: Arm Limited
    Inventors: Matthew Mattina, Zhigang Liu, Paul Nicholas Whatmough, David Hennah Mansell
  • Publication number: 20220180158
    Abstract: An artificial neural network (ANN) accelerator is provided. The ANN accelerator includes digital controlled oscillators (DCOs), digital-to-time converters (DTCs) and a mixed-signal multiply-and-accumulate (MAC) array. Each DCO generates a first analog operand signal based on a first digital data value, and transmits the first analog operand signal along a respective column signal line. Each DTC generates a second analog operand signal based on a second digital data value, and transmits the second analog operand signal along a respective row signal line. The mixed-signal MAC array is coupled to the row and column signal lines, and includes mixed-signal MAC units. Each mixed-signal MAC unit includes an integrated clock gate (ICG) that generates a digital product signal based on the first and second analog operand signals, and a counter circuit that increments or decrements a count value stored in a register based on the digital product signal.
    Type: Application
    Filed: December 9, 2020
    Publication date: June 9, 2022
    Applicant: Arm Limited
    Inventor: Paul Nicholas Whatmough
  • Publication number: 20220164137
    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 26, 2022
    Applicant: Arm Limited
    Inventors: Mudit Bhargava, Paul Nicholas Whatmough, Supreet Jeloka, Zhi-Gang Liu
  • Publication number: 20220164127
    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of write word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each write word selector has an input port and a plurality of output ports, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the write word selectors of the first bank and the second bank, and configured to select a combination of write word selectors from at least one of the first bank and the second bank based on a bank select signal.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 26, 2022
    Applicant: Arm Limited
    Inventors: Mudit Bhargava, Paul Nicholas Whatmough, Supreet Jeloka, Zhi-Gang Liu
  • Publication number: 20220101085
    Abstract: A non-volatile memory (NVM) crossbar for an artificial neural network (ANN) accelerator is provided. The NVM crossbar includes row signal lines configured to receive input analog voltage signals, multiply-and-accumulate (MAC) column signal lines, a correction column signal line, a MAC cell disposed at each row signal line and MAC column signal line intersection, and a correction cell disposed at each row signal line and correction column signal line intersection. Each MAC cell includes one or more programmable NVM elements programmed to an ANN unipolar weight, and each correction cell includes one or more programmable NVM elements. Each MAC column signal line generates a MAC signal based on the input analog voltage signals and the respective MAC cells, and the correction column signal line generates a correction signal based on the input analog voltage signals and the correction cells. Each MAC signal is corrected based on the correction signal.
    Type: Application
    Filed: September 29, 2020
    Publication date: March 31, 2022
    Applicant: Arm Limited
    Inventors: Fernando Garcia Redondo, Shidhartha Das, Paul Nicholas Whatmough, Glen Arnold Rosendale
  • Publication number: 20220035890
    Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.
    Type: Application
    Filed: November 24, 2020
    Publication date: February 3, 2022
    Applicant: Arm Limited
    Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
  • Publication number: 20210390367
    Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.
    Type: Application
    Filed: June 15, 2020
    Publication date: December 16, 2021
    Applicant: Arm Limited
    Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
  • Patent number: 11194549
    Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.
    Type: Grant
    Filed: October 25, 2019
    Date of Patent: December 7, 2021
    Assignee: Arm Limited
    Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough
  • Publication number: 20210374508
    Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.
    Type: Application
    Filed: May 28, 2020
    Publication date: December 2, 2021
    Applicant: Arm Limited
    Inventors: Paul Nicholas Whatmough, Zhi-Gang Liu, Matthew Mattina
  • Patent number: 11188814
    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: November 30, 2021
    Assignee: Arm Limited
    Inventors: Paul Nicholas Whatmough, Ian Rudolf Bratt, Matthew Mattina
  • Patent number: 11120101
    Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: September 14, 2021
    Assignee: Arm Limited
    Inventors: Zhi-Gang Liu, Matthew Mattina, Paul Nicholas Whatmough
  • Publication number: 20210192323
    Abstract: The present disclosure advantageously provides a hardware accelerator for an artificial neural network (ANN), including a communication bus interface, a memory, a controller, and at least one processing engine (PE). The communication bus interface is configured to receive a plurality of finetuned weights associated with the ANN, receive input data, and transmit output data. The memory is configured to store the plurality of finetuned weights, the input data and the output data. The PE is configured to receive the input data, execute an ANN model using a plurality of fixed weights associated with the ANN and the plurality of finetuned weights, and generate the output data. Each finetuned weight corresponds to a fixed weight.
    Type: Application
    Filed: December 19, 2019
    Publication date: June 24, 2021
    Applicant: Arm Limited
    Inventors: Paul Nicholas Whatmough, Chuteng Zhou
  • Publication number: 20210124560
    Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.
    Type: Application
    Filed: October 25, 2019
    Publication date: April 29, 2021
    Applicant: Arm Limited
    Inventors: Zhi-Gang LIU, Paul Nicholas Whatmough
  • Patent number: 10970201
    Abstract: A system, apparatus and method for utilizing a transpose function to generate a two-dimensional array from three-dimensional input data. The use of the transpose function reduces redundant elements in the resultant two-dimensional array thereby increasing efficiency and decreasing power consumption.
    Type: Grant
    Filed: October 24, 2018
    Date of Patent: April 6, 2021
    Assignee: Arm Limited
    Inventor: Paul Nicholas Whatmough