Multiplication Of Matrices Patents (Class 708/607)
  • Patent number: 11580194
    Abstract: An information processing apparatus includes a sparse element detection part, a sparse location weight addition part, a multiplication part, a non-sparse data operation part, and an addition part. The sparse element detection part detects a predetermined sparse element from input data and outputs information about the sparse element. The sparse location weight addition part adds a first weight elements corresponding to the sparse element. The multiplication part multiplies an output of the sparse location weight addition part by the sparse element. The non-sparse data operation part performs an operation on non-sparse elements, each other than the sparse element in the input data. The addition part adds an output of the multiplication part and an output of the non-sparse data operation part.
    Type: Grant
    Filed: October 30, 2018
    Date of Patent: February 14, 2023
    Assignee: NEC CORPORATION
    Inventor: Seiya Shibata
  • Patent number: 11580059
    Abstract: A memory architecture and a processing unit that incorporates the memory architecture and a systolic array. The memory architecture includes: memory array(s) with multi-port (MP) memory cells; first wordlines connected to the cells in each row; and, depending upon the embodiment, second wordlines connected to diagonals of cells or diagonals of sets of cells. Data from a data input matrix is written to the memory cells during first port write operations using the first wordlines and read out from the memory cells during second port read operations using the second wordlines. Due to the diagonal orientation of the second wordlines and due to additional features (e.g., additional rows of memory cells that store static zero data values or read data mask generators that generate read data masks), data read from the memory architecture and input directly into a systolic array is in the proper order, as specified by a data setup matrix.
    Type: Grant
    Filed: July 31, 2019
    Date of Patent: February 14, 2023
    Assignee: Marvell Asia Pte. Ltd.
    Inventors: Venkatraghavan Bringivijayaraghavan, Aravindan J. Busi, Deepak I. Hanagandi, Igor Arsovski
  • Patent number: 11568022
    Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: January 31, 2023
    Assignee: Intel Corporation
    Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
  • Patent number: 11562047
    Abstract: A method of increasing computer hardware efficiency of a matrix computation. The method comprises receiving at a computer processing device, digital signals encoding one or more operations of the matrix computation, each operation including one or more operands. The method further comprises, responsive to determining, by a sparse data check device of the computer processing machine, that an operation of the matrix computation includes all dense operands, forwarding the operation to a dense computation device of the computer processing machine configured to perform the operation of the matrix computation based on the dense operands. The method further comprises, responsive to determining, by the sparse data check device, that an operation of the matrix computation includes one or more sparse operands, forwarding the operation to a sparse computation device configured to perform the operation of the matrix computation.
    Type: Grant
    Filed: April 29, 2020
    Date of Patent: January 24, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Layali Rashid, Saurabh M. Kulkarni, Marc Tremblay
  • Patent number: 11556357
    Abstract: Systems, media, and methods may identify loops of a unit of computation for performing operations associated with the loops. The system, media, and methods may receive textual program code that includes a unit of computation that comprises a loop (e.g., explicit/implicit loop). The unit of computation may be identified by an identifier (e.g., variable name within the textual program code, text string embedded in the unit of computation, and/or syntactical pattern that is unique within the unit of computation). A code portion and/or a section thereof may include an identifier referring to the unit of computation, where the code portion and the unit of computation may be at independent locations of each other. The systems, media, and methods may semantically identify a loop that corresponds to the identifier and perform operations on the textual program code using the code portion and/or section.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: January 17, 2023
    Assignee: The MathWorks, Inc.
    Inventors: Sumit Ghosh, Vinit Deodhar, Denis Gurchenkov, Zhen Wang
  • Patent number: 11538989
    Abstract: An in-memory computing architecture is disclosed that can evaluate the transitive closure of graphs using the natural parallel flow of information in 3-D nanoscale crossbars. The architecture can be implemented using 3-D crossbar architectures with as few as two layers of 1-diode 1-resistor (1D1R) interconnects. The architecture avoids memory-processor bottlenecks and can hence scale to large graphs. The approach leads to a runtime complexity of O(n2) using O(n2) memristor devices. This compares favorably to conventional algorithms with a time complexity of O((n3)/p+(n2) log p) on p processors. The approach takes advantage of the dynamics of 3-D crossbars not available on 2-D crossbars.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: December 27, 2022
    Assignee: UNIVERSITY OF CENTRAL FLORIDA RESEARCH FOUNDATION, INC.
    Inventors: Alvaro Velasquez, Sumit Kumar Jha
  • Patent number: 11526737
    Abstract: Data to be processed includes vector element values of an input vector and matrix element values of a model matrix associated with a neural network model. A vector-matrix multiplication module receives a set of matrix element values for performing a vector-matrix multiplication operation. Processing the data includes computing a plurality of intermediate vectors based on element-wise vector multiplication between different subsets of the vector element values and different respective pre-processing vectors. The vector-matrix multiplication module is loaded with a core matrix, and the input vector is multiplied by the model matrix based on separately multiplying each of the intermediate vectors by the loaded core matrix.
    Type: Grant
    Filed: January 31, 2020
    Date of Patent: December 13, 2022
    Assignee: Lightelligence, Inc.
    Inventors: Matthew Raja Khoury, Rumen Rumenov Dangovski, Longwu Ou, Yichen Shen, Li Jing
  • Patent number: 11507641
    Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: November 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Majed Valad Beigi, Amin Farmahini-Farahani, Sudhanva Gurumurthi
  • Patent number: 11500962
    Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained fine-grained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained fine-grained sparse weight matrices. The weight matrix can then be converted into a set of constrained fine-grained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained fine-grained sparse weight matrices.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja
  • Patent number: 11501151
    Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: November 15, 2022
    Assignee: Arm Limited
    Inventors: Paul Nicholas Whatmough, Zhi-Gang Liu, Matthew Mattina
  • Patent number: 11494463
    Abstract: Performing set operations using sparse matrix operations offered by a multi-core processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: November 8, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Ritwik Das
  • Patent number: 11494625
    Abstract: A novel energy-efficient multiplication circuit using analog multipliers and adders reduces the distance data has to move and the number of times the data has to be moved when performing matrix multiplications in the analog domain. The multiplication circuit is tailored to bitwise multiply the innermost product of a rearranged matrix formula to output the generate a matrix multiplication result in form of a current that is then digitized for further processing.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: November 8, 2022
    Assignee: Maxim Integrated Products, Inc.
    Inventors: Sung Ung Kwak, Robert Michael Muchsel
  • Patent number: 11481472
    Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: October 25, 2022
    Inventor: David John Simpson
  • Patent number: 11481215
    Abstract: The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.
    Type: Grant
    Filed: January 17, 2020
    Date of Patent: October 25, 2022
    Assignee: Cambricon (Xi'an) Semiconductor Co., Ltd.
    Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu
  • Patent number: 11481224
    Abstract: A digital filter according to the disclosure includes a processing circuit having a memory and a number of parallel processing circuits. The parallel processing circuits perform a convolution operations based on input data and function data that is accessed from the memory. The filter further includes a serializer for serializing data that is received from the processing circuits. A clock generator circuit provides a first clock signal to the processing circuit and a second clock signal to the serializer. The frequency of the second clock signal is greater than that of the first clock signal.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: October 25, 2022
    Assignee: Apple Inc.
    Inventors: Tao Mai, Robert G. Lorenz, Joachim S. Hammerschmidt, Utku Seckin
  • Patent number: 11474798
    Abstract: The disclosed systems, structures, and methods are directed to optimizing memory access to constants in heterogeneous parallel computers, including systems that support OpenCL. This is achieved in an optimizing compiler that transforms program scope constants and constants at the outermost scope of kernels into implicit constant pointer arguments. The optimizing compiler also attempts to determine access patterns for constants at compile-time and places the constants in a variety of memory types available in a compute device architecture based on these access patterns.
    Type: Grant
    Filed: August 24, 2020
    Date of Patent: October 18, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Guansong Zhang, Weiwei Li
  • Patent number: 11469772
    Abstract: A method, system, and program product accesses chunks of data identifying data elements. A mask is used to identify a position of the data elements that have zero values and that have non-zero values. The data elements are processed based on the mask. For compression of data, data elements in chunks of data that have zero values and that have non-zero values are determined. A mask is used to identify a position of the data elements that have zero values and that have non-zero values. The data elements in the chunks of data having zero values are removed. The data elements having non-zero values are packed into the chunks to form the compressed data. For decompressing the data, zero-value data elements are added in positions in the chunks of data according to the mask to form uncompressed data.
    Type: Grant
    Filed: January 27, 2020
    Date of Patent: October 11, 2022
    Inventors: Joshua Huang, Hsilin Huang
  • Patent number: 11449577
    Abstract: Methods and apparatus for performing video processing matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. Exemplary embodiments described herein perform DCT matrix-matrix multiplication operations within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one embodiment, matrix-matrix multiplication operations are obtained using separate matrix-vector products. In one exemplary embodiment, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage.
    Type: Grant
    Filed: November 20, 2019
    Date of Patent: September 20, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Fa-Long Luo
  • Patent number: 11430083
    Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: August 30, 2022
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries
  • Patent number: 11430529
    Abstract: A method for capacitance coupling parameter estimation includes determining a plurality of mean voltages among a plurality of memory cells of the memory in each of a plurality of cases related to inter-cell interference, generating a plurality of middle state mean voltages in response to the mean voltages, and adjusting one or more threshold voltages used to read from the memory based on the middle state mean voltages to operate independently of knowledge of middle state distributions in the memory cells.
    Type: Grant
    Filed: February 23, 2018
    Date of Patent: August 30, 2022
    Assignee: Seagate Technology LLC
    Inventors: Meysam Asadi, Zhengang Chen, Erich F. Haratsch
  • Patent number: 11422801
    Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: August 23, 2022
    Assignee: Google LLC
    Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
  • Patent number: 11416580
    Abstract: An apparatus to facilitate matrix multiplication operations. The apparatus comprises multiplication hardware to operate in a dot product mode, wherein a multiplication stage included in the multiplication hardware is configured as a dot product of a number of bit vectors (N) to perform N×N multiplication operations on a plurality of multiplicands and perform addition operations on results of the N×N multiplication operations.
    Type: Grant
    Filed: November 13, 2019
    Date of Patent: August 16, 2022
    Assignee: Intel Corporation
    Inventors: Nevin Mathew, Shubra Marwaha, Ashutosh Garg
  • Patent number: 11409839
    Abstract: The present disclosure relates to a method for controlling execution of a GEMM operation on an accelerator comprising multiple computation units, a first memory device, and a second memory device. The method comprises determining an execution manner of the GEMM operation, the execution manner comprising partition information of the GEMM operation and computation unit allocation information of the partitioned GEMM operation; generating one or more instructions to compute the partitioned GEMM operation on one or more allocated computation units; and issuing the one or more instructions to at least one of a first queue and a second queue, which enables at least one of a first local controller and a second local controller to execute the one or more instructions, wherein the first local controller and the second local controller are configured to control data movement between the computation units, the first memory device, and the second memory device.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: August 9, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Yuhao Wang, Fei Sun, Fei Xue, Yen-Kuang Chen, Hongzhong Zheng
  • Patent number: 11409523
    Abstract: A graphics processing unit includes a sparse matrix detection unit, a register file, an assertion register, and a matrix calculation unit. The sparse matrix detection unit reads a plurality of matrices from a storage device and determines whether the matrices are zero matrices or non-zero matrices to output a determination result. The register file stores the plurality of matrices from the sparse matrix detection unit. The assertion register marks up the matrices according to the determination result, and outputs a mark result. The matrix calculation unit receives a matrix calculation instruction, reads the non-zero matrices in the plurality of matrices from the register file according to the mark result, and calculates the non-zero matrices.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: August 9, 2022
    Assignee: GLENFLY TECHNOLOGY CO., LTD.
    Inventors: Wei Zhang, Deming Gu
  • Patent number: 11403367
    Abstract: Techniques described herein perform spherical PIP analysis by detecting whether a test ray (defined by a test point (TP) and a point (EP) that is external to a spherical polygon) crosses edge arcs (“edges”) of the polygon based on relative orientations of vertices of the test ray and edges. A classifier vector (CV) for a test ray is calculated based on the cross-product of the TP and the EP. Using the CV, the orientation of each vertex of the polygon with respect to the test ray is determined. Candidate edges having vertices with opposite orientations with respect to the test ray are identified. Crossing edges are determine by calculating CVs for each candidate edge, and determining orientations of the TP and EP with respect to each candidate edge. A set of crossing edges is determined, where the TP and the EP have opposite orientations with respect to each crossing edge.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: August 2, 2022
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: William Martinez Cortes, Shasank Kisan Chavan, Siva Ravada, Ying Hu
  • Patent number: 11397791
    Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.
    Type: Grant
    Filed: January 4, 2022
    Date of Patent: July 26, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Leijun He, Bin Xu, Kaixing Wang
  • Patent number: 11392376
    Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.
    Type: Grant
    Filed: April 11, 2019
    Date of Patent: July 19, 2022
    Assignee: Arm Limited
    Inventors: Zhigang Liu, Matthew Mattina, Paul Nicholas Whatmough, Jesse Garrett Beu
  • Patent number: 11392667
    Abstract: Systems and methods of configuring an array of processors of an integrated circuit includes identifying a fast Fourier transform (FFT) matrix multiply of input data, wherein the FFT matrix multiply of the input data includes a bit-reversed input array, configuring the array of processing cores based on the bit-reversed input array, wherein the configuring the array of processing cores includes storing the input bits of the bit-reversed input array within memory circuits of distinct processing cores of an array of processing cores of the integrated circuit based on an input bit mapping that identifies a pre-determined storage location within the array of processing cores of each input bit of the bit-reversed input array, and performing matrix multiply computations between weight stages of the FFT matrix multiply and the input bits of the bit-reversed input array stored within the memory circuits of the distinct processing cores.
    Type: Grant
    Filed: December 20, 2021
    Date of Patent: July 19, 2022
    Assignee: quadric.io, Inc.
    Inventors: Aman Sikka, Nigel Drego, Daniel Firu, Veerbhan Kheterpal
  • Patent number: 11386507
    Abstract: A computer-implemented method for analyzing a time-varying graph is provided. The time-varying graph includes nodes representing elements in a network, edges representing transactions between elements, and data associated with the nodes and the edges. The computer-implemented method includes constructing, using a processor, adjacency and feature matrices describing each node and edge of each time-varying graph for stacking into an adjacency tensor and describing the data of each time-varying graph for stacking into a feature tensor, respectively. The adjacency and feature tensors are partitioned into adjacency and feature training tensors and into adjacency and feature validation tensors, respectively. An embedding model and a prediction model are created using the adjacency and feature training tensors. The embedding and prediction models are validated using the adjacency and feature validation tensors to identify an optimized embedding-prediction model pair.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: July 12, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, Trustees of Tufts College, RAMOT AT TEL-AVIV UNIVERSITY LTD.
    Inventors: Lior Horesh, Osman Asif Malik, Shashanka Ubaru, Misha E. Kilmer, Haim Avron
  • Patent number: 11361050
    Abstract: Example implementations relate to assigning dependent matrix-vector multiplication (MVM) operations to consecutive crossbars of a dot product engine (DPE). A method can comprise grouping a first MVM operation of a computation graph with a second MVM operation of the computation graph where the first MVM operation is dependent on a result of the second MVM operation, assigning a first crossbar of a DPE to an operand of the first MVM operation, and assigning a second crossbar of the DPE to an operand of the second MVM operation, wherein the first and second crossbars are consecutive.
    Type: Grant
    Filed: November 20, 2018
    Date of Patent: June 14, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Soumitra Chatterjee, Sunil Vishwanathpur Lakshminarasimha, Mohan Parthasarathy
  • Patent number: 11360741
    Abstract: An arithmetic circuit includes an LUT generation circuit (1) that, when coefficients c[n] (n=1, . . . , N) are paired two by two, outputs a value calculated for each of the pairs, and a distributed arithmetic circuit (2-m) that calculates values y[m] of product-sum arithmetic, by which data x[m, n] of a data set X[m] containing M pairs of data x[m, n] is multiplied by the coefficients c[n] and the products are summed up, in parallel for each of the M pairs.
    Type: Grant
    Filed: December 18, 2018
    Date of Patent: June 14, 2022
    Assignees: NTT ELECTRONICS CORPORATION, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Kenji Kawai, Ryo Awata, Kazuhito Takei, Masaaki Iizuka
  • Patent number: 11354383
    Abstract: Various arrangements for performing successive vector-matrix multiplication may include sequentially performing a first vector-matrix multiplication operation for each bit-order of values in an input vector. The first vector-matrix multiplication operation for each bit-order may generate an analog output. For each analog output generated by the vector-matrix multiplication operation, an analog output may be converted into one or more digital bit values, and the one or more digital bit values may be sent to a second vector-matrix multiplication operation.
    Type: Grant
    Filed: November 19, 2019
    Date of Patent: June 7, 2022
    Assignee: Applied Materials, Inc
    Inventors: Frank Tzen-Wen Guo, She-Hwa Yen
  • Patent number: 11347828
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Patent number: 11334648
    Abstract: Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: May 17, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Hu Liu, Heng Liao, Jiajin Tu, Honghui Yuan, Hou Fun Lam, Fan Zhu
  • Patent number: 11328037
    Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.
    Type: Grant
    Filed: July 7, 2017
    Date of Patent: May 10, 2022
    Assignee: Intel Corporation
    Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
  • Patent number: 11321805
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
  • Patent number: 11307853
    Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes calculation circuits, a control circuit, a multiplication circuit, and a routing circuit. The calculation circuits produce multiply-accumulate values. The control circuit receives a plurality of first element values of a first matrix. The control circuit classifies the first element values into at least one classification value. The multiplication circuit multiplies the classification value by a second element value of a second matrix in a low power mode to obtain at least one product value. The routing circuit transmits each of the product values to at least one corresponding calculation circuit in the calculation circuits in the low power mode.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: April 19, 2022
    Assignee: NEUCHIPS CORPORATION
    Inventors: Chiung-Liang Lin, Chao-Yang Kao, Youn-Long Lin, Huang-Chih Kuo, Jian-Wen Chen
  • Patent number: 11301546
    Abstract: A method comprises receiving one or more sizes for each of the dimensions of a kernel that is convolved with an input tensor to generate an output activation, generating a control pattern used to compute output values for the convolution of the input tensor, with the control pattern being a square matrix with each dimension being a size equal to the product of the width and the height of the kernel. The control pattern is generated by generating a value for each position of the control pattern that is based on a location of the position in the control pattern and the one or more sizes of each of the dimensions of the kernel, the value indicating a location from which to access values from a flattened input tensor for the convolution with the kernel.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: April 12, 2022
    Assignee: Groq, Inc.
    Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd
  • Patent number: 11294985
    Abstract: Techniques are provided for efficient matrix multiplication using in-memory analog parallel processing, with applications for neural networks and artificial intelligence processors. A methodology implementing the techniques according to an embodiment includes storing two matrices in-memory. The first matrix is stored in transposed form such that the transposed first matrix has the same number of rows as the second matrix. The method further includes reading columns of the matrices from the memory in parallel, using disclosed bit line functional read operations and cross bit line functional read operations, which are employed to generate analog dot products between the columns. Each of the dot products corresponds to an element of the matrix multiplication product of the two matrices. In some embodiments, one of the matrices may be used to store neural network weighting factors, and the other matrix may be used to store input data to be processed by the neural network.
    Type: Grant
    Filed: October 30, 2018
    Date of Patent: April 5, 2022
    Assignee: Intel Corporation
    Inventors: Amrita Mathuriya, Sasikanth Manipatruni, Dmitri Nikonov, Ian Young, Ram Krishnamurthy
  • Patent number: 11269630
    Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: March 8, 2022
    Assignee: INTEL CORPORATION
    Inventors: Simon Rubanovich, Amit Gradstein, Zeev Sperber
  • Patent number: 11256780
    Abstract: Methods and apparatus for fast Eigenvalue decomposition of Hermitian matrices are disclosed. In an exemplary embodiment, a method is provided for performing a decomposition iteration that includes identifying a largest off-diagonal term of a channel response matrix X, generating a 2×2 Hermitian matrix X2 that includes the largest off-diagonal term, and generating a 2×2 Unitary matrix ?2 from the 2×2 Hermitian matrix X2. The decomposition iteration also includes multiplying the 2×2 Unitary matrix ?2 with the 2×2 Hermitian matrix X2 to generate an updated largest off-diagonal term and updating the channel response matrix X with the updated largest off-diagonal term. The method also includes performing one or more additional decomposition iterations until all off-diagonal terms of the channel response matrix X are less than a target value.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: February 22, 2022
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Hyun Soo Cheon
  • Patent number: 11238130
    Abstract: A signal processing method and apparatus, where the method includes partitioning a signal matrix to obtain X×H fractal signal matrices, partitioning a weight matrix to obtain H×Y fractal weight matrices, obtaining an operation sequence of X×H×Y matrix multiplications based on performance parameters, and processing the X×H×Y matrix multiplications to obtain X×Y result matrices, where the operation sequence of the X×H×Y matrix multiplications is obtained.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: February 1, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventor: Ruosheng Xu
  • Patent number: 11227030
    Abstract: Techniques for data manipulation using a matrix multiplication engine using pipelining are disclosed. A first and a second matrix are obtained for matrix multiplication. A first matrix multiply-accumulate (MAC) unit is configured, where a first matrix element and a second matrix element are presented to the MAC unit on a first cycle. A second MAC unit is configured in pipelined fashion, where the first element of the first matrix and a second element of the second matrix are presented to the second MAC unit on a second cycle, and where a second element of the first matrix and the first element of the second matrix are presented to the first MAC unit on the second cycle. Additional MAC units are further configured within the processor in pipelined fashion. Multiply-accumulate operations are executed in pipelined fashion on each of n MAC units over additional k sets of m cycles.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: January 18, 2022
    Assignee: Wave Computing, Inc.
    Inventor: David John Simpson
  • Patent number: 11194549
    Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.
    Type: Grant
    Filed: October 25, 2019
    Date of Patent: December 7, 2021
    Assignee: Arm Limited
    Inventors: Zhi-Gang Liu, Paul Nicholas Whatmough
  • Patent number: 11194886
    Abstract: Various arrangements for performing vector-matrix multiplication are provided here. Digital input vectors that include binary-encoded values can be converted into a plurality of analog signals using a plurality of one-bit digital to analog converters (DACs). Using an analog vector matrix multiplier, a vector-matrix multiplication operation can be performed using a weighting matrix for each bit-order of the plurality of analog signals. For each performed vector-matrix multiplication operation, a bit-ordered indication of an output of the analog vector matrix multiplier may be stored. A bit-order weighted summation of the sequentially performed vector-matrix multiplication operation may be performed.
    Type: Grant
    Filed: May 9, 2019
    Date of Patent: December 7, 2021
    Assignee: Applied Materials, Inc.
    Inventors: She-Hwa Yen, Frank Tzen-Wen Guo
  • Patent number: 11188618
    Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: November 30, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Mathew Nevin, Jorge Parra, Ashutosh Garg, Shubra Marwaha, Shubh Shah
  • Patent number: 11188617
    Abstract: The method includes compiling data into mutual information columns, determining mutual information for each pairing of the mutual information columns and creating a matrix using the mutual information, the matrix including a first set of data columns, wherein each of the first set of data columns represents at least one feature of the data. The method further includes computing eigenvalues and eigenvectors of the matrix, ordering the eigenvalues using an absolute value of the eigenvalues, iteratively selecting at least one second set of data columns by successively removing data columns from the first set of data columns based on the ordered eigenvalues, and controlling an operation of an electronic device based on the at least one second set of data.
    Type: Grant
    Filed: January 10, 2019
    Date of Patent: November 30, 2021
    Assignee: Nokia Technologies OY
    Inventors: Iraj Saniee, Christos Mavridis
  • Patent number: 11150900
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: October 19, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11093580
    Abstract: A processor sequences the application of submatrices at a matrix multiplier to reduce the number of input changes at an input register of the matrix multiplier. The matrix multiplier is configured to perform a matrix multiplication for a relatively small matrix. To multiply two larger matrices the GPU decomposes the larger matrices into smaller submatrices and stores the submatrices at input registers of the matrix multiplier in a sequence, thereby calculating each column of a result matrix. The GPU sequences the storage of the submatrices at the input registers to maintain input data at one of the input registers over multiple calculation cycles of the matrix multiplier thereby reducing power consumption at the GPU.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: August 17, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Maxim V. Kazakov, Jian Mao
  • Patent number: 11062071
    Abstract: A method for computer-based simulation or control of a dynamic system using a computer includes: cyclically receiving, by a programmable logic device, at least one input signal; calculating, by the programmable logic device, at least one matrix multiplication; and outputting, by the programmable logic device, at least one output signal. A configuration of the programmable logic device includes: a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and an adder tree for multiplication results. Successive blocks of the matrix are temporarily stored in a pipeline and processed sequentially. A target number of blocks and a target adder stage are determined based on a number and/or values of parameters of at least one system equation. Processing of blocks for a current cycle is terminated based on the target number of blocks and the target adder stage being reached.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: July 13, 2021
    Assignee: DSPACE DIGITAL SIGNAL PROCESSING AND CONTROL ENGINEERING GMBH
    Inventors: Vivien Chandra, Philip Grunert