Multiplication Of Matrices Patents (Class 708/607)

Patent number: 11687336Abstract: An extensible multiprecision data pipeline system, comprising, a local buffer that stores an input local data set in a local storage format, an input tensor shaper coupled to the local buffer that reads the input local data set and converts the input local data set into an input tensor data set having a tensor format of vector width N by tensor length L, a cascaded pipeline coupled to the input tensor shaper that routes the input tensor data set through at least one function stage resulting in an output tensor data set, an output tensor shaper coupled to the cascaded pipeline that converts the output tensor data set into an output local data set having the local storage format and wherein the output tensor shaper writes the output local data set to the local buffer.Type: GrantFiled: May 8, 2020Date of Patent: June 27, 2023Assignee: Black Sesame Technologies Inc.Inventors: Yi Wang, Zheng Qi, Hui Wang, Zheng Li

Patent number: 11651283Abstract: An approach is described for a method, product, and apparatus for a machine learning process using dynamic rearrangement of sparse data and corresponding weights. This approach includes a method, product, and apparatus for dynamically rearranging input data to move sparse data to a location such that computations on the sparse data might be avoided when executing a machine learning processing job. For example, sparse data within each row of the input matrix can be moved to the end of each corresponding row. When the input data is folded to fit the array, that sparse data might be at least partially contained within a fold that comprises only sparse data and possibly filler data. In such an event, computations on the fold are unnecessary and are avoided. In some embodiments, the approach includes dynamically rearranging a weight matrix to maintain a correspondence between the input data and the weights.Type: GrantFiled: June 30, 2020Date of Patent: May 16, 2023Assignee: Cadence Design Systems, Inc.Inventors: Yong Liu, Ngai Ngai William Hung, Michael Patrick Zimmer

Patent number: 11645077Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.Type: GrantFiled: June 1, 2021Date of Patent: May 9, 2023Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha OuldAhmedVall, Menachem Adelman, Eyal Hadas

Patent number: 11645665Abstract: Examples apparatus disclosed herein are to determine a plurality of weights based on a data structure having elements corresponding to pairings of ones of a plurality of demographic partition statistics and ones of a plurality of baseline demographic statistics obtained for a target population, the demographic partition statistics corresponding to a plurality of demographic partitions of a sample population, a first element of the data structure to combine a first one of the demographic partition statistics with a first one of the baseline demographic statistics of the target population based on a first value corresponding to a numerator term of an expression and a second value corresponding to a denominator term of the expression, the weights corresponding respectively to the demographic partitions of the sample population. Disclosed example apparatus are also to adjust the attribute data based on the weights to determine ratings data for the target population.Type: GrantFiled: August 10, 2020Date of Patent: May 9, 2023Assignee: THE NIELSEN COMPANY (US), LLCInventors: Michael Sheppard, Jonathan Sullivan, Alejandro Terrazas, Peter Lipa, Albert Ronald Perez

Patent number: 11593456Abstract: A resistive matrix computation circuit and methods for using the same are disclosed.Type: GrantFiled: July 7, 2020Date of Patent: February 28, 2023Assignee: Ambient Scientific, Inc.Inventor: Gajendra Prasad Singh

Patent number: 11593455Abstract: A scalable matrix computation circuit and methods for using the same are disclosed. In one embodiment, a matrix computation circuit includes a plurality of first operand memory configured to store a first set of input operands of the matrix computation circuit, a plurality of second operand memory configured to store a second set of input operands of the matrix computation circuit, where the first and second sets of input operands are programmable by the controller, a plurality of multiplier circuits arranged in a plurality of rows and plurality of columns, where each row receives a corresponding operand from the first set of operands, and each column receives a corresponding operand from the second set of operands, and the each corresponding operand from the each row is used multiple times by the multiplier circuits in that row to perform multiplications controlled by the controller, and a plurality of aggregator circuits configured to store charges produced by the plurality of multiplier circuits.Type: GrantFiled: July 7, 2020Date of Patent: February 28, 2023Assignee: Ambient Scientific, Inc.Inventor: Gajendra Prasad Singh

Patent number: 11580059Abstract: A memory architecture and a processing unit that incorporates the memory architecture and a systolic array. The memory architecture includes: memory array(s) with multiport (MP) memory cells; first wordlines connected to the cells in each row; and, depending upon the embodiment, second wordlines connected to diagonals of cells or diagonals of sets of cells. Data from a data input matrix is written to the memory cells during first port write operations using the first wordlines and read out from the memory cells during second port read operations using the second wordlines. Due to the diagonal orientation of the second wordlines and due to additional features (e.g., additional rows of memory cells that store static zero data values or read data mask generators that generate read data masks), data read from the memory architecture and input directly into a systolic array is in the proper order, as specified by a data setup matrix.Type: GrantFiled: July 31, 2019Date of Patent: February 14, 2023Assignee: Marvell Asia Pte. Ltd.Inventors: Venkatraghavan Bringivijayaraghavan, Aravindan J. Busi, Deepak I. Hanagandi, Igor Arsovski

Patent number: 11580194Abstract: An information processing apparatus includes a sparse element detection part, a sparse location weight addition part, a multiplication part, a nonsparse data operation part, and an addition part. The sparse element detection part detects a predetermined sparse element from input data and outputs information about the sparse element. The sparse location weight addition part adds a first weight elements corresponding to the sparse element. The multiplication part multiplies an output of the sparse location weight addition part by the sparse element. The nonsparse data operation part performs an operation on nonsparse elements, each other than the sparse element in the input data. The addition part adds an output of the multiplication part and an output of the nonsparse data operation part.Type: GrantFiled: October 30, 2018Date of Patent: February 14, 2023Assignee: NEC CORPORATIONInventor: Seiya Shibata

Patent number: 11568022Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of Sbit elements of the identified first source bit matrix with Sbit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.Type: GrantFiled: January 22, 2021Date of Patent: January 31, 2023Assignee: Intel CorporationInventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov

Patent number: 11562047Abstract: A method of increasing computer hardware efficiency of a matrix computation. The method comprises receiving at a computer processing device, digital signals encoding one or more operations of the matrix computation, each operation including one or more operands. The method further comprises, responsive to determining, by a sparse data check device of the computer processing machine, that an operation of the matrix computation includes all dense operands, forwarding the operation to a dense computation device of the computer processing machine configured to perform the operation of the matrix computation based on the dense operands. The method further comprises, responsive to determining, by the sparse data check device, that an operation of the matrix computation includes one or more sparse operands, forwarding the operation to a sparse computation device configured to perform the operation of the matrix computation.Type: GrantFiled: April 29, 2020Date of Patent: January 24, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Layali Rashid, Saurabh M. Kulkarni, Marc Tremblay

Systems, media, and methods for identifying loops of or implementing loops for a unit of computation
Patent number: 11556357Abstract: Systems, media, and methods may identify loops of a unit of computation for performing operations associated with the loops. The system, media, and methods may receive textual program code that includes a unit of computation that comprises a loop (e.g., explicit/implicit loop). The unit of computation may be identified by an identifier (e.g., variable name within the textual program code, text string embedded in the unit of computation, and/or syntactical pattern that is unique within the unit of computation). A code portion and/or a section thereof may include an identifier referring to the unit of computation, where the code portion and the unit of computation may be at independent locations of each other. The systems, media, and methods may semantically identify a loop that corresponds to the identifier and perform operations on the textual program code using the code portion and/or section.Type: GrantFiled: March 25, 2021Date of Patent: January 17, 2023Assignee: The MathWorks, Inc.Inventors: Sumit Ghosh, Vinit Deodhar, Denis Gurchenkov, Zhen Wang 
Patent number: 11538989Abstract: An inmemory computing architecture is disclosed that can evaluate the transitive closure of graphs using the natural parallel flow of information in 3D nanoscale crossbars. The architecture can be implemented using 3D crossbar architectures with as few as two layers of 1diode 1resistor (1D1R) interconnects. The architecture avoids memoryprocessor bottlenecks and can hence scale to large graphs. The approach leads to a runtime complexity of O(n2) using O(n2) memristor devices. This compares favorably to conventional algorithms with a time complexity of O((n3)/p+(n2) log p) on p processors. The approach takes advantage of the dynamics of 3D crossbars not available on 2D crossbars.Type: GrantFiled: July 30, 2018Date of Patent: December 27, 2022Assignee: UNIVERSITY OF CENTRAL FLORIDA RESEARCH FOUNDATION, INC.Inventors: Alvaro Velasquez, Sumit Kumar Jha

Patent number: 11526737Abstract: Data to be processed includes vector element values of an input vector and matrix element values of a model matrix associated with a neural network model. A vectormatrix multiplication module receives a set of matrix element values for performing a vectormatrix multiplication operation. Processing the data includes computing a plurality of intermediate vectors based on elementwise vector multiplication between different subsets of the vector element values and different respective preprocessing vectors. The vectormatrix multiplication module is loaded with a core matrix, and the input vector is multiplied by the model matrix based on separately multiplying each of the intermediate vectors by the loaded core matrix.Type: GrantFiled: January 31, 2020Date of Patent: December 13, 2022Assignee: Lightelligence, Inc.Inventors: Matthew Raja Khoury, Rumen Rumenov Dangovski, Longwu Ou, Yichen Shen, Li Jing

Patent number: 11507641Abstract: Techniques for performing inmemory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of inmemory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the inmemory matrix multiplier and ineffective portions are mapped to high temperature regions of the inmemory matrix multiplier. The matrix multiplication is then performed.Type: GrantFiled: May 31, 2019Date of Patent: November 22, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Majed Valad Beigi, Amin FarmahiniFarahani, Sudhanva Gurumurthi

Patent number: 11501151Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multistage add module coupled to the input register and the output register. The multistage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform backtoback accumulation into the output register.Type: GrantFiled: May 28, 2020Date of Patent: November 15, 2022Assignee: Arm LimitedInventors: Paul Nicholas Whatmough, ZhiGang Liu, Matthew Mattina

Patent number: 11500962Abstract: To take advantage of the architecture of a systolic array tailored to perform sparse matrix multiplications, a weight matrix can be converted into a set of constrained finegrained sparse weight matrices. The conversion process may include receiving a request to perform a matrix multiplication operation with a weight matrix, and determining that the weight matrix satisfies a sparsity condition to convert the weight matrix into a set of constrained finegrained sparse weight matrices. The weight matrix can then be converted into a set of constrained finegrained sparse weight matrices. Computer instructions can then be generated for an integrated circuit device to perform the requested matrix multiplication operation as a set of sparse matrix multiplication operations using the set of constrained finegrained sparse weight matrices.Type: GrantFiled: June 30, 2020Date of Patent: November 15, 2022Assignee: Amazon Technologies, Inc.Inventors: Paul Gilbert Meyer, Thiam Khean Hah, Randy Renfu Huang, Ron Diamant, Vignesh Vivekraja

Patent number: 11494463Abstract: Performing set operations using sparse matrix operations offered by a multicore processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.Type: GrantFiled: April 14, 2020Date of Patent: November 8, 2022Assignee: Microsoft Technology Licensing, LLCInventor: Ritwik Das

Systems and methods for energyefficient analog matrix multiplication for machine learning processes
Patent number: 11494625Abstract: A novel energyefficient multiplication circuit using analog multipliers and adders reduces the distance data has to move and the number of times the data has to be moved when performing matrix multiplications in the analog domain. The multiplication circuit is tailored to bitwise multiply the innermost product of a rearranged matrix formula to output the generate a matrix multiplication result in form of a current that is then digitized for further processing.Type: GrantFiled: October 1, 2019Date of Patent: November 8, 2022Assignee: Maxim Integrated Products, Inc.Inventors: Sung Ung Kwak, Robert Michael Muchsel 
Digital filter with programmable impulse response for direct amplitude modulation at radio frequency
Patent number: 11481224Abstract: A digital filter according to the disclosure includes a processing circuit having a memory and a number of parallel processing circuits. The parallel processing circuits perform a convolution operations based on input data and function data that is accessed from the memory. The filter further includes a serializer for serializing data that is received from the processing circuits. A clock generator circuit provides a first clock signal to the processing circuit and a second clock signal to the serializer. The frequency of the second clock signal is greater than that of the first clock signal.Type: GrantFiled: August 30, 2019Date of Patent: October 25, 2022Assignee: Apple Inc.Inventors: Tao Mai, Robert G. Lorenz, Joachim S. Hammerschmidt, Utku Seckin 
Patent number: 11481472Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiplyaccumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: GrantFiled: July 30, 2020Date of Patent: October 25, 2022Inventor: David John Simpson

Patent number: 11481215Abstract: The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.Type: GrantFiled: January 17, 2020Date of Patent: October 25, 2022Assignee: Cambricon (Xi'an) Semiconductor Co., Ltd.Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu

Patent number: 11474798Abstract: The disclosed systems, structures, and methods are directed to optimizing memory access to constants in heterogeneous parallel computers, including systems that support OpenCL. This is achieved in an optimizing compiler that transforms program scope constants and constants at the outermost scope of kernels into implicit constant pointer arguments. The optimizing compiler also attempts to determine access patterns for constants at compiletime and places the constants in a variety of memory types available in a compute device architecture based on these access patterns.Type: GrantFiled: August 24, 2020Date of Patent: October 18, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Guansong Zhang, Weiwei Li

Patent number: 11469772Abstract: A method, system, and program product accesses chunks of data identifying data elements. A mask is used to identify a position of the data elements that have zero values and that have nonzero values. The data elements are processed based on the mask. For compression of data, data elements in chunks of data that have zero values and that have nonzero values are determined. A mask is used to identify a position of the data elements that have zero values and that have nonzero values. The data elements in the chunks of data having zero values are removed. The data elements having nonzero values are packed into the chunks to form the compressed data. For decompressing the data, zerovalue data elements are added in positions in the chunks of data according to the mask to form uncompressed data.Type: GrantFiled: January 27, 2020Date of Patent: October 11, 2022Inventors: Joshua Huang, Hsilin Huang

Patent number: 11449577Abstract: Methods and apparatus for performing video processing matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. Exemplary embodiments described herein perform DCT matrixmatrix multiplication operations within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one embodiment, matrixmatrix multiplication operations are obtained using separate matrixvector products. In one exemplary embodiment, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage.Type: GrantFiled: November 20, 2019Date of Patent: September 20, 2022Assignee: Micron Technology, Inc.Inventor: FaLong Luo

Patent number: 11430529Abstract: A method for capacitance coupling parameter estimation includes determining a plurality of mean voltages among a plurality of memory cells of the memory in each of a plurality of cases related to intercell interference, generating a plurality of middle state mean voltages in response to the mean voltages, and adjusting one or more threshold voltages used to read from the memory based on the middle state mean voltages to operate independently of knowledge of middle state distributions in the memory cells.Type: GrantFiled: February 23, 2018Date of Patent: August 30, 2022Assignee: Seagate Technology LLCInventors: Meysam Asadi, Zhengang Chen, Erich F. Haratsch

Patent number: 11430083Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more elementwise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.Type: GrantFiled: March 5, 2021Date of Patent: August 30, 2022Assignee: Intel CorporationInventors: Eriko Nurvitadhi, Balaji Vembu, TsungHan Lin, Kamal Sinha, Rajkishore Barik, Nicolas C. Galoppo Von Borries

Patent number: 11422801Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.Type: GrantFiled: January 4, 2019Date of Patent: August 23, 2022Assignee: Google LLCInventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo

Patent number: 11416580Abstract: An apparatus to facilitate matrix multiplication operations. The apparatus comprises multiplication hardware to operate in a dot product mode, wherein a multiplication stage included in the multiplication hardware is configured as a dot product of a number of bit vectors (N) to perform N×N multiplication operations on a plurality of multiplicands and perform addition operations on results of the N×N multiplication operations.Type: GrantFiled: November 13, 2019Date of Patent: August 16, 2022Assignee: Intel CorporationInventors: Nevin Mathew, Shubra Marwaha, Ashutosh Garg

Patent number: 11409839Abstract: The present disclosure relates to a method for controlling execution of a GEMM operation on an accelerator comprising multiple computation units, a first memory device, and a second memory device. The method comprises determining an execution manner of the GEMM operation, the execution manner comprising partition information of the GEMM operation and computation unit allocation information of the partitioned GEMM operation; generating one or more instructions to compute the partitioned GEMM operation on one or more allocated computation units; and issuing the one or more instructions to at least one of a first queue and a second queue, which enables at least one of a first local controller and a second local controller to execute the one or more instructions, wherein the first local controller and the second local controller are configured to control data movement between the computation units, the first memory device, and the second memory device.Type: GrantFiled: August 21, 2020Date of Patent: August 9, 2022Assignee: Alibaba Group Holding LimitedInventors: Yuhao Wang, Fei Sun, Fei Xue, YenKuang Chen, Hongzhong Zheng

Patent number: 11409523Abstract: A graphics processing unit includes a sparse matrix detection unit, a register file, an assertion register, and a matrix calculation unit. The sparse matrix detection unit reads a plurality of matrices from a storage device and determines whether the matrices are zero matrices or nonzero matrices to output a determination result. The register file stores the plurality of matrices from the sparse matrix detection unit. The assertion register marks up the matrices according to the determination result, and outputs a mark result. The matrix calculation unit receives a matrix calculation instruction, reads the nonzero matrices in the plurality of matrices from the register file according to the mark result, and calculates the nonzero matrices.Type: GrantFiled: January 4, 2021Date of Patent: August 9, 2022Assignee: GLENFLY TECHNOLOGY CO., LTD.Inventors: Wei Zhang, Deming Gu

Patent number: 11403367Abstract: Techniques described herein perform spherical PIP analysis by detecting whether a test ray (defined by a test point (TP) and a point (EP) that is external to a spherical polygon) crosses edge arcs (“edges”) of the polygon based on relative orientations of vertices of the test ray and edges. A classifier vector (CV) for a test ray is calculated based on the crossproduct of the TP and the EP. Using the CV, the orientation of each vertex of the polygon with respect to the test ray is determined. Candidate edges having vertices with opposite orientations with respect to the test ray are identified. Crossing edges are determine by calculating CVs for each candidate edge, and determining orientations of the TP and EP with respect to each candidate edge. A set of crossing edges is determined, where the TP and the EP have opposite orientations with respect to each crossing edge.Type: GrantFiled: April 14, 2020Date of Patent: August 2, 2022Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: William Martinez Cortes, Shasank Kisan Chavan, Siva Ravada, Ying Hu

Patent number: 11397791Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.Type: GrantFiled: January 4, 2022Date of Patent: July 26, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Leijun He, Bin Xu, Kaixing Wang

Patent number: 11392667Abstract: Systems and methods of configuring an array of processors of an integrated circuit includes identifying a fast Fourier transform (FFT) matrix multiply of input data, wherein the FFT matrix multiply of the input data includes a bitreversed input array, configuring the array of processing cores based on the bitreversed input array, wherein the configuring the array of processing cores includes storing the input bits of the bitreversed input array within memory circuits of distinct processing cores of an array of processing cores of the integrated circuit based on an input bit mapping that identifies a predetermined storage location within the array of processing cores of each input bit of the bitreversed input array, and performing matrix multiply computations between weight stages of the FFT matrix multiply and the input bits of the bitreversed input array stored within the memory circuits of the distinct processing cores.Type: GrantFiled: December 20, 2021Date of Patent: July 19, 2022Assignee: quadric.io, Inc.Inventors: Aman Sikka, Nigel Drego, Daniel Firu, Veerbhan Kheterpal

Patent number: 11392376Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of nonzero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified nonzero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by justintime compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.Type: GrantFiled: April 11, 2019Date of Patent: July 19, 2022Assignee: Arm LimitedInventors: Zhigang Liu, Matthew Mattina, Paul Nicholas Whatmough, Jesse Garrett Beu

Patent number: 11386507Abstract: A computerimplemented method for analyzing a timevarying graph is provided. The timevarying graph includes nodes representing elements in a network, edges representing transactions between elements, and data associated with the nodes and the edges. The computerimplemented method includes constructing, using a processor, adjacency and feature matrices describing each node and edge of each timevarying graph for stacking into an adjacency tensor and describing the data of each timevarying graph for stacking into a feature tensor, respectively. The adjacency and feature tensors are partitioned into adjacency and feature training tensors and into adjacency and feature validation tensors, respectively. An embedding model and a prediction model are created using the adjacency and feature training tensors. The embedding and prediction models are validated using the adjacency and feature validation tensors to identify an optimized embeddingprediction model pair.Type: GrantFiled: September 23, 2019Date of Patent: July 12, 2022Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, Trustees of Tufts College, RAMOT AT TELAVIV UNIVERSITY LTD.Inventors: Lior Horesh, Osman Asif Malik, Shashanka Ubaru, Misha E. Kilmer, Haim Avron

Patent number: 11361050Abstract: Example implementations relate to assigning dependent matrixvector multiplication (MVM) operations to consecutive crossbars of a dot product engine (DPE). A method can comprise grouping a first MVM operation of a computation graph with a second MVM operation of the computation graph where the first MVM operation is dependent on a result of the second MVM operation, assigning a first crossbar of a DPE to an operand of the first MVM operation, and assigning a second crossbar of the DPE to an operand of the second MVM operation, wherein the first and second crossbars are consecutive.Type: GrantFiled: November 20, 2018Date of Patent: June 14, 2022Assignee: Hewlett Packard Enterprise Development LPInventors: Soumitra Chatterjee, Sunil Vishwanathpur Lakshminarasimha, Mohan Parthasarathy

Patent number: 11360741Abstract: An arithmetic circuit includes an LUT generation circuit (1) that, when coefficients c[n] (n=1, . . . , N) are paired two by two, outputs a value calculated for each of the pairs, and a distributed arithmetic circuit (2m) that calculates values y[m] of productsum arithmetic, by which data x[m, n] of a data set X[m] containing M pairs of data x[m, n] is multiplied by the coefficients c[n] and the products are summed up, in parallel for each of the M pairs.Type: GrantFiled: December 18, 2018Date of Patent: June 14, 2022Assignees: NTT ELECTRONICS CORPORATION, NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Kenji Kawai, Ryo Awata, Kazuhito Takei, Masaaki Iizuka

Patent number: 11354383Abstract: Various arrangements for performing successive vectormatrix multiplication may include sequentially performing a first vectormatrix multiplication operation for each bitorder of values in an input vector. The first vectormatrix multiplication operation for each bitorder may generate an analog output. For each analog output generated by the vectormatrix multiplication operation, an analog output may be converted into one or more digital bit values, and the one or more digital bit values may be sent to a second vectormatrix multiplication operation.Type: GrantFiled: November 19, 2019Date of Patent: June 7, 2022Assignee: Applied Materials, IncInventors: Frank TzenWen Guo, SheHwa Yen

Patent number: 11347828Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.Type: GrantFiled: March 27, 2020Date of Patent: May 31, 2022Assignee: Intel CorporationInventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna

Patent number: 11334648Abstract: Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.Type: GrantFiled: June 29, 2020Date of Patent: May 17, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Hu Liu, Heng Liao, Jiajin Tu, Honghui Yuan, Hou Fun Lam, Fan Zhu

Patent number: 11328037Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.Type: GrantFiled: July 7, 2017Date of Patent: May 10, 2022Assignee: Intel CorporationInventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr

Patent number: 11321805Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixedpoint logic, the compute unit to receive a set of dynamic fixedpoint tensors, compute, via the dynamic precision fixedpoint logic, a rightshift value using an absolute maximum value within the set of dynamic fixedpoint tensors and a dynamic range of the set of dynamic fixedpoint tensors, rightshift data values within the set of dynamic fixedpoint tensors based on the rightshift value, increment a shared exponent associated with the set of dynamic fixedpoint tensors based on the rightshift value, perform a compute operation on the set of dynamic fixedpoint tensors, and generate an output tensor via the compute operation on the set of dynamic fixedpoint tensors.Type: GrantFiled: October 29, 2020Date of Patent: May 3, 2022Assignee: Intel CorporationInventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan

Patent number: 11307853Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes calculation circuits, a control circuit, a multiplication circuit, and a routing circuit. The calculation circuits produce multiplyaccumulate values. The control circuit receives a plurality of first element values of a first matrix. The control circuit classifies the first element values into at least one classification value. The multiplication circuit multiplies the classification value by a second element value of a second matrix in a low power mode to obtain at least one product value. The routing circuit transmits each of the product values to at least one corresponding calculation circuit in the calculation circuits in the low power mode.Type: GrantFiled: October 29, 2019Date of Patent: April 19, 2022Assignee: NEUCHIPS CORPORATIONInventors: ChiungLiang Lin, ChaoYang Kao, YounLong Lin, HuangChih Kuo, JianWen Chen

Patent number: 11301546Abstract: A method comprises receiving one or more sizes for each of the dimensions of a kernel that is convolved with an input tensor to generate an output activation, generating a control pattern used to compute output values for the convolution of the input tensor, with the control pattern being a square matrix with each dimension being a size equal to the product of the width and the height of the kernel. The control pattern is generated by generating a value for each position of the control pattern that is based on a location of the position in the control pattern and the one or more sizes of each of the dimensions of the kernel, the value indicating a location from which to access values from a flattened input tensor for the convolution with the kernel.Type: GrantFiled: November 18, 2019Date of Patent: April 12, 2022Assignee: Groq, Inc.Inventors: Jonathan Alexander Ross, Thomas Hawkins, Gregory Michael Thorson, Matt Boyd

Patent number: 11294985Abstract: Techniques are provided for efficient matrix multiplication using inmemory analog parallel processing, with applications for neural networks and artificial intelligence processors. A methodology implementing the techniques according to an embodiment includes storing two matrices inmemory. The first matrix is stored in transposed form such that the transposed first matrix has the same number of rows as the second matrix. The method further includes reading columns of the matrices from the memory in parallel, using disclosed bit line functional read operations and cross bit line functional read operations, which are employed to generate analog dot products between the columns. Each of the dot products corresponds to an element of the matrix multiplication product of the two matrices. In some embodiments, one of the matrices may be used to store neural network weighting factors, and the other matrix may be used to store input data to be processed by the neural network.Type: GrantFiled: October 30, 2018Date of Patent: April 5, 2022Assignee: Intel CorporationInventors: Amrita Mathuriya, Sasikanth Manipatruni, Dmitri Nikonov, Ian Young, Ram Krishnamurthy

Patent number: 11269630Abstract: Disclosed embodiments relate to an interleaved pipeline of floatingpoint (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADDBYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.Type: GrantFiled: March 29, 2019Date of Patent: March 8, 2022Assignee: INTEL CORPORATIONInventors: Simon Rubanovich, Amit Gradstein, Zeev Sperber

Patent number: 11256780Abstract: Methods and apparatus for fast Eigenvalue decomposition of Hermitian matrices are disclosed. In an exemplary embodiment, a method is provided for performing a decomposition iteration that includes identifying a largest offdiagonal term of a channel response matrix X, generating a 2×2 Hermitian matrix X2 that includes the largest offdiagonal term, and generating a 2×2 Unitary matrix ?2 from the 2×2 Hermitian matrix X2. The decomposition iteration also includes multiplying the 2×2 Unitary matrix ?2 with the 2×2 Hermitian matrix X2 to generate an updated largest offdiagonal term and updating the channel response matrix X with the updated largest offdiagonal term. The method also includes performing one or more additional decomposition iterations until all offdiagonal terms of the channel response matrix X are less than a target value.Type: GrantFiled: May 20, 2021Date of Patent: February 22, 2022Assignee: Marvell Asia Pte, Ltd.Inventor: Hyun Soo Cheon

Patent number: 11238130Abstract: A signal processing method and apparatus, where the method includes partitioning a signal matrix to obtain X×H fractal signal matrices, partitioning a weight matrix to obtain H×Y fractal weight matrices, obtaining an operation sequence of X×H×Y matrix multiplications based on performance parameters, and processing the X×H×Y matrix multiplications to obtain X×Y result matrices, where the operation sequence of the X×H×Y matrix multiplications is obtained.Type: GrantFiled: June 29, 2020Date of Patent: February 1, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventor: Ruosheng Xu

Patent number: 11227030Abstract: Techniques for data manipulation using a matrix multiplication engine using pipelining are disclosed. A first and a second matrix are obtained for matrix multiplication. A first matrix multiplyaccumulate (MAC) unit is configured, where a first matrix element and a second matrix element are presented to the MAC unit on a first cycle. A second MAC unit is configured in pipelined fashion, where the first element of the first matrix and a second element of the second matrix are presented to the second MAC unit on a second cycle, and where a second element of the first matrix and the first element of the second matrix are presented to the first MAC unit on the second cycle. Additional MAC units are further configured within the processor in pipelined fashion. Multiplyaccumulate operations are executed in pipelined fashion on each of n MAC units over additional k sets of m cycles.Type: GrantFiled: March 31, 2020Date of Patent: January 18, 2022Assignee: Wave Computing, Inc.Inventor: David John Simpson

Patent number: 11194886Abstract: Various arrangements for performing vectormatrix multiplication are provided here. Digital input vectors that include binaryencoded values can be converted into a plurality of analog signals using a plurality of onebit digital to analog converters (DACs). Using an analog vector matrix multiplier, a vectormatrix multiplication operation can be performed using a weighting matrix for each bitorder of the plurality of analog signals. For each performed vectormatrix multiplication operation, a bitordered indication of an output of the analog vector matrix multiplier may be stored. A bitorder weighted summation of the sequentially performed vectormatrix multiplication operation may be performed.Type: GrantFiled: May 9, 2019Date of Patent: December 7, 2021Assignee: Applied Materials, Inc.Inventors: SheHwa Yen, Frank TzenWen Guo