Patents by Inventor Matthew Mattina

Matthew Mattina has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Activation Compression Method for Deep Learning Acceleration

Publication number: 20220164663

Abstract: A system and method for multiplying matrices, and method for training a convolutional neural network (CNN), are provided. The system includes a processor and a matrix multiply accelerator (MMA). The processor is configured to generate, based on an input tensor, a number of basic block matrices, each basic block matrix including a number of elements; for each basic block matrix: prune, based on a sparsity value, the elements of the basic block matrix, generate a mask for the basic block matrix, each mask including a number of bits, each bit corresponding to a different element of the basic block matrix, and compress the basic block matrix to generate a compressed basic block matrix having fewer elements than the basic block matrix. The MMA is configured to multiply, based on the masks, the compressed basic block matrices and a weight matrix to generate an output matrix.

Type: Application

Filed: January 25, 2021

Publication date: May 26, 2022

Applicant: Arm Limited

Inventors: Zhi-Gang Liu, Matthew Mattina
Time Domain Unrolling Sparse Matrix Multiplication System and Method

Publication number: 20220035890

Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

Type: Application

Filed: November 24, 2020

Publication date: February 3, 2022

Applicant: Arm Limited

Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
Hardware Accelerator For IM2COL Operation

Publication number: 20210390367

Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.

Type: Application

Filed: June 15, 2020

Publication date: December 16, 2021

Applicant: Arm Limited

Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough, Matthew Mattina
Modulo Operation Unit

Publication number: 20210374509

Abstract: The present disclosure advantageously provides a modulo operation unit that includes a first input configured to receive operand data, a second input configured to receive modulus data, an initial modulo stage, a sequence of intermediate modulo stages, and a final modulo stage.

Type: Application

Filed: June 1, 2020

Publication date: December 2, 2021

Applicant: Arm Limited

Inventors: Zhi-Gang Liu, Matthew Mattina
Pipelined Accumulator

Publication number: 20210374508

Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.

Type: Application

Filed: May 28, 2020

Publication date: December 2, 2021

Applicant: Arm Limited

Inventors: Paul Nicholas Whatmough, Zhi-Gang Liu, Matthew Mattina
Systolic convolutional neural network

Patent number: 11188814

Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

Type: Grant

Filed: April 5, 2018

Date of Patent: November 30, 2021

Assignee: Arm Limited

Inventors: Paul Nicholas Whatmough, Ian Rudolf Bratt, Matthew Mattina
Cache coherency in multiprocessor system

Patent number: 11151033

Abstract: A processor includes a plurality of cache memories, and a plurality of processor cores, each associated with one of the cache memories. Each of at least some of the cache memories is associated with information indicating whether data stored in the cache memory is shared among multiple processor cores.

Type: Grant

Filed: March 13, 2014

Date of Patent: October 19, 2021

Assignee: Tilera Corporation

Inventors: David M. Wentzlaff, Matthew Mattina, Anant Agarwal
Artificial Neural Network Optical Hardware Accelerator

Publication number: 20210287078

Abstract: The present disclosure advantageously provides an Optical Hardware Accelerator (OHA) for an Artificial Neural Network (ANN) that includes a communication bus interface, a memory, a controller, and an optical computing engine (OCE). The OCE is configured to execute an ANN model with ANN weights. Each ANN weight includes a quantized phase shift value ?i and a phase shift value ?i. The OCE includes a digital-to-optical (D/O) converter configured to generate input optical signals based on the input data, an optical neural network (ONN) configured to generate output optical signals based on the input optical signals, and an optical-to-digital (O/D) converter configured to generate the output data based on the output optical signals. The ONN includes a plurality of optical units (OUs), and each OU includes an optical multiply and accumulate (OMAC) module.

Type: Application

Filed: March 13, 2020

Publication date: September 16, 2021

Applicant: Arm Limited

Inventors: Zhi-Gang Liu, Matthew Mattina, John Fremont Brown, III
Matrix multiplication system and method

Patent number: 11120101

Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.

Type: Grant

Filed: September 27, 2019

Date of Patent: September 14, 2021

Assignee: Arm Limited

Inventors: Zhi-Gang Liu, Matthew Mattina, Paul Nicholas Whatmough
Matrix Multiplication System and Method

Publication number: 20210097130

Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.

Type: Application

Filed: September 27, 2019

Publication date: April 1, 2021

Inventors: Zhi-Gang Liu, Matthew Mattina, Paul Nicholas Whatmough
MIXED-PRECISION COMPUTATION UNIT

Publication number: 20210089889

Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

Type: Application

Filed: March 31, 2020

Publication date: March 25, 2021

Applicant: Arm Limited

Inventors: Dibakar Gope, Jesse Garrett Beu, Paul Nicholas Whatmough, Matthew Mattina
Hybrid Filter Banks for Artificial Neural Networks

Publication number: 20210089888

Abstract: The present disclosure advantageously provides a system including a memory, a processor, and a circuitry to execute one or more mixed precision layers of an artificial neural network (ANN), each mixed precision layer including high-precision weight filters and low precision weight filters. The circuitry is configured to perform one or more calculations on an input feature map having a plurality of input channels (cin) using the high precision weight filters to create a high precision output feature map having a first number of output channels (k), perform one or more calculations on the input feature map using the low precision weight filters to create a low precision output feature map having a second number of output channels (cout?k), and concatenate the high precision output feature map and the low precision output feature map to create a unified output feature map having a plurality of output channels (cout).

Type: Application

Filed: March 31, 2020

Publication date: March 25, 2021

Applicant: Arm Limited

Inventors: Dibakar Gope, Jesse Garrett Beu, Paul Nicholas Whatmough, Matthew Mattina
Refactoring MAC Computations for Reduced Programming Steps

Publication number: 20210064379

Abstract: A method and architecture for performing multiply-accumulate operations in a neural network is disclosed. The architecture includes a crossbar having a plurality of non-volatile memory elements. A plurality of input activations is applied to the crossbar, which are then summed by binary weight encoding a plurality of the non-volatile memory elements to connect the input activations to weight values. At least one of the plurality of non-volatile memory elements is then precision programmed.

Type: Application

Filed: August 29, 2019

Publication date: March 4, 2021

Applicant: Arm Limited

Inventors: Matthew Mattina, Shidhartha Das, Glen Arnold Rosendale, Fernando Garcia Redondo
High performance, scalable multi chip interconnect

Patent number: 10887238

Abstract: A flexible, scalable server is described. The server includes plural server nodes each server node including processor cores and switching circuitry configured to couple the processor to a network among the cores with the plurality of cores implementing networking functions within the compute nodes wherein the plurality of cores networking capabilities allow the cores to connect to each other, and to offer a single interface to a network coupled to the server.

Type: Grant

Filed: July 23, 2019

Date of Patent: January 5, 2021

Assignee: MELLANOX TECHNOLOGIES, LTD.

Inventors: Carl G. Ramey, Matthew Mattina
NON-VOLATILE MEMORY-BASED COMPACT MIXED-SIGNAL MULTIPLY-ACCUMULATE ENGINE

Publication number: 20200410333

Abstract: A multiply-accumulate method and architecture are disclosed. The architecture includes a plurality of networks of non-volatile memory elements arranged in tiled columns. Logic digitally modulates the equivalent conductance of individual networks among the plurality of networks to map the equivalent conductance of each individual network to a single weight within the neural network. A first partial selection of weights within the neural network is mapped into the equivalent conductances of the networks in the columns to enable the computation of multiply-and-accumulate operations by mixed-signal computation. The logic updates the mappings to select a second partial selection of weights to compute additional multiply-and-accumulate operations and repeats the mapping and computation operations until all computations for the neural network are completed.

Type: Application

Filed: June 25, 2019

Publication date: December 31, 2020

Applicant: Arm Limited

Inventors: Shidhartha Das, Matthew Mattina, Glen Arnold Rosendale, Fernando Garcia Redondo
APPARATUS AND METHOD FOR MATRIX OPERATIONS

Publication number: 20200372097

Abstract: There is provided a data processing apparatus to perform an operation on a first matrix and a second matrix. The data processing apparatus includes receiver circuitry to receive elements of the first matrix, elements of the second matrix, and correspondence data to indicate where the elements of the first matrix are located in the first matrix. Determination circuitry performs, using the correspondence data, a determination of whether, for a given element of the first matrix in column i of the first matrix, a given element of the second matrix occurs in row i of the second matrix. Aggregation circuitry calculates an aggregation between a given row in the first matrix and a given column in the second matrix and includes: functional circuitry to perform, in dependence on the determination, a function on the given element of the first matrix and the given element of the second matrix to produce a partial result.

Type: Application

Filed: May 21, 2019

Publication date: November 26, 2020

Inventors: Matthew MATTINA, Zhigang LIU, Paul Nicholas WHATMOUGH, David Hennah MANSELL
PROCESSOR FOR SPARSE MATRIX COMPUTATION

Publication number: 20200326938

Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.

Type: Application

Filed: April 11, 2019

Publication date: October 15, 2020

Inventors: Zhigang Liu, Matthew Mattina, Paul Nicholas Whatmough, Jesse Garrett Beu
System, method and apparatus for computationally efficient data manipulation

Patent number: 10747845

Abstract: A system, apparatus and method for exposing input data operands and input weight operands to elements of a two-dimensional array so that two pairs of operands are exposed to each element of the array.

Type: Grant

Filed: August 31, 2018

Date of Patent: August 18, 2020

Assignee: Arm Limited

Inventors: Paul Nicholas Whatmough, Matthew Mattina, Zhigang Liu
High Performance, Scalable Multi Chip Interconnect

Publication number: 20200177510

Abstract: A flexible, scalable server is described. The server includes plural server nodes each server node including processor cores and switching circuitry configured to couple the processor to a network among the cores with the plurality of cores implementing networking functions within the compute nodes wherein the plurality of cores networking capabilities allow the cores to connect to each other, and to offer a single interface to a network coupled to the server.

Type: Application

Filed: July 23, 2019

Publication date: June 4, 2020

Inventors: Carl G. Ramey, Matthew Mattina
Computing in parallel processing environments

Patent number: 10606750

Abstract: A computing system comprises one or more cores. Each core comprises a processor. In some implementations, each processor is coupled to a communication network among the cores. In some implementations, a switch in each core includes switching circuitry to forward data received over data paths from other cores to the processor and to switches of other cores, and to forward data received from the processor to switches of other cores.

Type: Grant

Filed: April 11, 2017

Date of Patent: March 31, 2020

Assignee: Mallanox Technologies Ltd.

Inventors: Matthew Mattina, Chyi-Chang Miao

prev 1 2 3 4 5 next