Multiplication Of Matrices Patents (Class 708/607)

Throughput increase for compute engine

Patent number: 12260214

Abstract: A compute channel can have multiple computational circuit blocks coupled in series to form a pipeline. The compute channel can perform a computation on an input tensor to generate an output tensor based on an instruction. When the computational does not require all of the computational circuit blocks, the throughput of the compute channel can be increased by splitting the data elements of the input tensor into multiple input data streams. The multiple input data streams are provided to respective subsets of one or more computational circuit blocks in the pipeline using bypass circuitry of the computational circuit blocks, and the computation can be performed on multiple input data streams in the respective subsets of one or more computational circuit blocks to generate multiple output data streams corresponding to the output tensor.

Type: Grant

Filed: September 30, 2022

Date of Patent: March 25, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Paul Gilbert Meyer, Ron Diamant, Sundeep Amirineni, Sunil Kumar Bathula
Performing matrix multiplication in a streaming processor

Patent number: 12229215

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: October 16, 2023

Date of Patent: February 18, 2025

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
Integer matrix multiplication engine using pipelining

Patent number: 12223011

Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.

Type: Grant

Filed: November 27, 2023

Date of Patent: February 11, 2025

Assignee: MIPS Holding, Inc.

Inventor: David John Simpson
Systems and methods for performing matrix compress and decompress instructions

Patent number: 12175246

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: September 1, 2023

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
In-memory computation circuit and method

Patent number: 12164882

Abstract: A memory circuit includes a selection circuit, a column of memory cells, and an adder tree. The selection circuit is configured to receive input data elements, each input data element including a number of bits equal to H, and output a selected set of kth bits of the H bits of the input data elements. Each memory cell of the column of memory cells includes a first storage unit configured to store a first weight data element and a first multiplier configured to generate a first product data element based on the first weight data element and a first kth bit of the selected set of kth bits. The adder tree is configured to generate a summation data element based on each of the first product data elements.

Type: Grant

Filed: March 16, 2021

Date of Patent: December 10, 2024

Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.

Inventors: Yu-Der Chih, Hidehiro Fujiwara, Yi-Chun Shih, Po-Hao Lee, Yen-Huei Chen, Chia-Fu Lee, Jonathan Tsung-Yung Chang
Apparatus and method for complex matrix transpose and multiply

Patent number: 12153899

Abstract: An apparatus and method for complex matrix transpose and multiply.

Type: Grant

Filed: December 23, 2020

Date of Patent: November 26, 2024

Assignee: Intel Corporation

Inventors: Menachem Adelman, Robert Valentine, Daniel Towner, Amit Gradstein, Mark Jay Charney
Arithmetic apparatus and multiply-accumulate system

Patent number: 12153974

Abstract: An arithmetic apparatus includes input line pairs and a multiply-accumulate device. A signal pair is input to the input line pairs within an input period. The multiply-accumulate device includes multiplication units, an accumulation unit, a charging unit, and an output unit. The multiplication units generate a positive weight charge and a negative weight charge. The accumulation unit accumulates the positive weight charge and the negative weight charge. The charging unit charges the accumulation unit after the input period. The output unit performs, after charging starts, threshold determination using a predetermined threshold value on a voltage of the accumulation unit, to thereby output a positive multiply-accumulate signal representing a sum of positive weight product values and a negative multiply-accumulate signal representing a sum of negative weight product values.

Type: Grant

Filed: March 12, 2020

Date of Patent: November 26, 2024

Assignee: Sony Group Corporation

Inventor: Hiroshi Yoshida
Techniques for accelerating matrix multiplication computations using hierarchical representations of sparse matrices

Patent number: 12141229

Abstract: One embodiment sets forth a technique for performing one or more matrix multiplication operations based on a first matrix and a second matrix. The technique includes receiving data associated with the first matrix from a first traversal engine that accesses nonzero elements included in the first matrix via a first tree structure. The technique also includes performing one or more computations on the data associated with the first matrix and the data associated with the second matrix to produce a plurality of partial results. The technique further includes combining the plurality of partial results into one or more intermediate results and storing the one or more intermediate results in a first buffer memory.

Type: Grant

Filed: May 19, 2021

Date of Patent: November 12, 2024

Assignee: NVIDIA Corporation

Inventors: Hanrui Wang, James Michael O'Connor, Donghyuk Lee
Tensor automatic differentiation

Patent number: 12130886

Abstract: Methods and systems are disclosed to reduce the time and memory complexities associated with automatic differentiation of tensor models. The disclosed embodiment consists of a tensor contraction gradient calculator (TCGC) method, a tensor automatic differentiation (TAD) method and a TAD system. The disclosed embodiment eliminates the need to compute partial derivatives or Jacobians for computing tensor gradients of tensor contractions and tensor models. The disclosed embodiment computes tensor gradients of any arbitrary tensor model automatically with both memory and time complexities asymptotically equal to those of the evaluation of tensor models that are theoretically the lowest achievable complexities.

Type: Grant

Filed: January 24, 2024

Date of Patent: October 29, 2024

Inventor: Mohammad Solgi
Dynamic bias analog vector-matrix multiplication operation circuit and operation control method therefor

Patent number: 12093342

Abstract: A dynamic bias analog vector-matrix multiplication operation circuit comprises: positive value weight columns (101-10N), constant columns (201-20M) and subtractors (301-30N), wherein the number of the subtractors is equal to the number of the positive value weight columns, the subtractors are correspondingly connected to the positive value weight columns on a one-to-one basis, and the number of the constant columns is less than the number of the positive value weight columns; minuend input ends of the subtractors are correspondingly connected to output ends of the positive value weight columns, subtrahend input ends of a plurality of subtractors are connected to the same constant column, and output ends thereof output operation results. Before a weight is written in a programmable semiconductor device, a constant positive value is added to each element in a weight array, the weight array is written in a positive value weight column, and the constant positive value is written in a constant column.

Type: Grant

Filed: April 3, 2019

Date of Patent: September 17, 2024

Assignee: BELJING ZHICUN (WITIN) TECHNOLOGY CORPORATION LIMITED

Inventor: Shaodi Wang
Asymmetric allocation of SRAM and data layout for efficient matrix-matrix multiplication

Patent number: 12072953

Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.

Type: Grant

Filed: June 16, 2021

Date of Patent: August 27, 2024

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
Stream processor with low power parallel matrix multiply pipeline

Patent number: 12067401

Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

Type: Grant

Filed: December 27, 2017

Date of Patent: August 20, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions

Patent number: 12056489

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Grant

Filed: May 5, 2023

Date of Patent: August 6, 2024

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Bit matrix multiplication

Patent number: 12045308

Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.

Type: Grant

Filed: December 16, 2022

Date of Patent: July 23, 2024

Assignee: Intel Corporation

Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
Methods for spectrally resolving fluorophores of a sample and systems for same

Patent number: 12038326

Abstract: Aspects of the present disclosure include methods for spectrally resolving light from fluorophores having overlapping fluorescence spectra in a sample. Methods according to certain embodiments include detecting light with a light detection system from a sample having a plurality of fluorophores having overlapping fluorescence spectra and spectrally resolving light from each fluorophore in the sample. In some embodiments, methods include estimating the abundance of one or more of the fluorophores in the sample, such as on a particle. In certain instances, methods include identifying the particle in the sample based on the abundance of each fluorophore and sorting the particle. Methods according to some embodiments includes spectrally resolving the light from each fluorophore by calculating a spectral unmixing matrix for the fluorescence spectra of each fluorophore. Systems and integrated circuit devices (e.g., a field programmable gate array) for practicing the subject methods are also provided.

Type: Grant

Filed: October 27, 2022

Date of Patent: July 16, 2024

Assignee: BECTON, DICKINSON AND COMPANY

Inventors: Peter Mage, Keegan Owsley
Methods for multiplying matrices using a plurality of chiplets

Patent number: 12001508

Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.

Type: Grant

Filed: October 23, 2023

Date of Patent: June 4, 2024

Assignee: Persimmons, Inc.

Inventor: James Michael Bodwin
Assigning processing threads for matrix-matrix multiplication

Patent number: 11989257

Abstract: An apparatus includes a processor and a memory to store instructions. The instructions, when executed by the processor, cause the processor to perform threading of a first matrix along a first dimension of the first matrix and a second dimension of the matrix. The threading represents block sizes of the first matrix to assign to process threads of a multiplication algorithm to determine a third matrix that represents a product of the first matrix and a second matrix. The block sizes include a first block size along the first dimension and a second block size along the second dimension. The second matrix shares the second dimension with the first matrix. The instructions, when executed by the processor, cause the processor to provide data to the multiplication algorithm, which represents the first block size and the second block size.

Type: Grant

Filed: October 29, 2020

Date of Patent: May 21, 2024

Assignee: Hewlett Packard Enterprise Development LP

Inventor: Aaron M. Collier
Set operations using multi-core processing unit

Patent number: 11941078

Abstract: Performing set operations using sparse matrix operations offered by a multi-core processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.

Type: Grant

Filed: September 30, 2022

Date of Patent: March 26, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventor: Ritwik Das
Matrix multiplier

Patent number: 11934481

Abstract: Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.

Type: Grant

Filed: April 20, 2022

Date of Patent: March 19, 2024

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Hu Liu, Heng Liao, Jiajin Tu, Honghui Yuan, Hou Fun Lam, Fan Zhu
Methods and apparatus for performing video processing matrix operations within a memory array

Patent number: 11928177

Abstract: Methods and apparatus for performing video processing matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. Exemplary embodiments described herein perform DCT matrix-matrix multiplication operations within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one embodiment, matrix-matrix multiplication operations are obtained using separate matrix-vector products. In one exemplary embodiment, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage.

Type: Grant

Filed: September 19, 2022

Date of Patent: March 12, 2024

Assignee: Micron Technology, Inc.

Inventor: Fa-Long Luo
Arrangements for storing more data in memory when using a hierarchical memory structure

Patent number: 11922021

Abstract: Data employed in computations is processed so that during computations more of the data can be fit into or maintained in a smaller but higher speed memory than an original source of the data. More specifically, a sensitivity value is determined for various items of the data which reflect the number of bits in the data items that are not garbage bits, and only information in the data items that are indicated by the sensitivity value to not be garbage bits are necessarily effectively retained. At least the information that is not garbage bits and the corresponding associated sensitivity are packed together. The results of computations that are performed using the data items as at least one of the operands for the computation are associated with a sensitivity that is derived from the individual sensitivities of the operands used in the computation.

Type: Grant

Filed: December 19, 2022

Date of Patent: March 5, 2024

Assignee: INTELLECTUAL PROPERTY SYSTEMS, LLC

Inventors: Juan Guillermo Gonzalez, Santiago Andres Fonseca, Rafael Camilo Nunez
Scalable, multi-precision, self-calibrated multiplier-accumulator architecture

Patent number: 11922131

Abstract: A method for performing vector-matrix multiplication may include converting a digital input vector comprising a plurality of binary-encoded values into a plurality of analog signals using a plurality of one-bit digital to analog converters (DACs); sequentially performing, using an analog vector matrix multiplier and based on bit-order, vector-matrix multiplication operations using a weighting matrix for the plurality of analog signals to generate analog outputs of the analog vector matrix multiplier; sequentially performing an analog-to-digital (ADC) operation on the analog outputs of the analog vector matrix multiplier to generate binary partial output vectors; and combining the binary partial output vectors to generate a result of the vector-matrix multiplication.

Type: Grant

Filed: November 7, 2020

Date of Patent: March 5, 2024

Assignee: Applied Materials, Inc.

Inventors: Xiaofeng Zhang, She-Hwa Yen
Processing apparatus for performing processing using a convolutional neural network

Patent number: 11900577

Abstract: There is provided with a processing apparatus. A data holder holds at least some of data of a plurality of channels in a target layer among a plurality of layers. Each of a plurality of processors performs, in parallel, a product-sum operation using the data of one channel of the target layer and a coefficient corresponding to the target layer. A selector selects whether to perform first processing or second processing on the basis of information specifying processing in the target layer. The first processing includes inputting the data of one channel of the target layer into one of the plurality of processors. The second processing includes inputting the data of one channel of the target layer to the plurality of processors in parallel.

Type: Grant

Filed: June 22, 2021

Date of Patent: February 13, 2024

Assignee: CANON KABUSHIKI KAISHA

Inventors: Tsewei Chen, Masami Kato, Shiori Wakino
Systems and methods for speech or text processing using matrix operations

Patent number: 11899745

Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.

Type: Grant

Filed: August 19, 2020

Date of Patent: February 13, 2024

Assignee: Meta Platforms Technologies, LLC

Inventors: Alagappan Valliappan, Ganesh Venkatesh, Pierce I-Jen Chuang
Computer architecture with resistive processing units

Patent number: 11886378

Abstract: A processor includes an array of resistive processing units connected between row and column lines with a resistive element. A first single instruction, multiple data processing unit (SIMD) is connected to the row lines. A second SIMD is connected to the column lines. A first instruction issuer is connected to the first SIMD to issue instructions to the first SIMD, and a second instruction issuer is connected to the second SIMD to issue instructions to the second SIMD such that the processor is programmable and configurable for specific operations depending on an issued instruction set.

Type: Grant

Filed: December 28, 2020

Date of Patent: January 30, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Tayfun Gokmen
Integer matrix multiplication engine using pipelining

Patent number: 11880426

Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.

Type: Grant

Filed: July 31, 2022

Date of Patent: January 23, 2024

Inventor: David John Simpson
Method, circuit, and SOC for performing matrix multiplication operation

Patent number: 11860970

Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.

Type: Grant

Filed: June 15, 2022

Date of Patent: January 2, 2024

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Leijun He, Bin Xu, Kaixing Wang
Method for rapidly calculating three-dimensional polarimetric dimension, device, and storage medium

Patent number: 11853386

Abstract: The invention relates to a method for rapidly calculating a three-dimensional polarimetric dimension, including: determining that an incident light field is a coherence matrix of a partially coherent Schell-model beam, and decomposing the coherence matrix into a form of multiplying an incident electric field by a coherence structure matrix of the incident light field; obtaining an electric field near a focal field after the incident electric field passes through a tight focusing system according to the vector diffraction theory, and describing a second-order correlation characteristic of a partially coherent vector beam near a tightly focused field by using a coherence matrix; obtaining a tightly focused polarization matrix based on the tightly focused coherence matrix; and rotating the tightly focused polarization matrix into an intrinsic coordinate frame of the tightly focused polarization matrix, and calculating a three-dimensional polarimetric dimension of the partially coherent Schell-model beam in the t

Type: Grant

Filed: February 11, 2022

Date of Patent: December 26, 2023

Assignee: SOOCHOW UNIVERSITY

Inventors: Yahong Chen, Chencheng Yan, Fei Wang, Yangjian Cai
Accelerating processing based on sparsity for neural network hardware processors

Patent number: 11853717

Abstract: Embodiments of the present disclosure include systems and methods for accelerating processing based on sparsity for neural network hardware processors. An input manager determines a pair of non-zero values from a pair of data streams in a plurality of pairs of data streams and retrieve the pair of non-zero values from the pair of data streams. A multiplier performs a multiplication operation on the pair of non-zero values and generate a product of the pair of non-zero values. An accumulator manager receives the product of the pair of non-zero values from the multiplier and sends the product of the pair of non-zero values to a corresponding accumulator in a plurality of accumulators.

Type: Grant

Filed: January 14, 2021

Date of Patent: December 26, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Karthikeyan Avudaiyappan, Jeffrey Andrews
Methods and apparatus for performing diversity matrix operations within a memory array

Patent number: 11853385

Abstract: Methods and apparatus for performing diversity matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for spatial diversity-related matrix transformations and performing matrix operations therein. Exemplary embodiments described herein perform MIMO-related matrix transformations (e.g., precoding, beamforming, or data recovery matrix operations) within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one variant, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage. The resulting signals can be converted from analog voltages to a digital values by an MMU to yield a matrix-vector product.

Type: Grant

Filed: December 5, 2019

Date of Patent: December 26, 2023

Assignee: Micron Technology, Inc.

Inventor: Fa-Long Luo
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements

Patent number: 11847185

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Grant

Filed: September 24, 2021

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
Multi-service business platform system having entity resolution systems and methods

Patent number: 11847106

Abstract: The disclosure is directed to various ways of improving the functioning of computer systems, information networks, data stores, search engine systems and methods, and other advantages. Among other things, provided herein are methods, systems, components, processes, modules, blocks, circuits, sub-systems, articles, and other elements (collectively referred to in some cases as the “platform” or the “system”) that collectively enable, in one or more datastores (e.g., where each datastore may include one or more databases) and systems, the creation, development, maintenance, and use of a set of custom objects for use in a wide range of activities, including sales activities, marketing activities, service activities, content development activities, and others, as well as improved methods and systems for sales, marketing and services that make use of such entity resolution systems and methods as well as custom objects.

Type: Grant

Filed: May 12, 2021

Date of Patent: December 19, 2023

Assignee: HUBSPOT, INC.

Inventors: Hector Urdiales, Marco Lagi, Stephen J. Purcell, Stuart P. Layton, Bryan Ash, Jared Williams, Sophie Higgs, Robert McEneaney, Dylan Sellberg, Anna Perko
Memory computation circuit

Patent number: 11830543

Abstract: A memory circuit includes a first memory array including first memory cells wherein a plurality of first word lines is coupled with a plurality of rows of first memory cells in a first segment of the first memory array, and a plurality of second word lines is coupled with the plurality of rows of first memory cells in a second segment of the first memory array. The memory circuit also includes a read circuit configured to retrieve data from the first memory cells of the first memory array and a computation circuit configured to perform a matrix computation by combining first data retrieved from the first memory cells of the first segment with second data retrieved from the first memory cells of the second segment.

Type: Grant

Filed: June 23, 2022

Date of Patent: November 28, 2023

Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.

Inventors: Yen-Huei Chen, Hidehiro Fujiwara, Hung-Jen Liao, Jonathan Tsung-Yung Chang
Energy-efficient analog-to-digital conversion in mixed signal circuitry

Patent number: 11811416

Abstract: An apparatus comprises at least one processor and at least one memory including instruction code configured to, with the at least one processor, cause the apparatus at least to perform a successive approximation analog-to-digital conversion of an analog input, representing a result of multiplication of first and second vectors, to a digital output by determining an upper bound on the result of multiplication of the first and second vectors, identifying, based at least in part on the determined upper bound, at least a portion of the successive approximation analog-to-digital conversion to be skipped, and skipping the identified portion of the successive approximation analog-to-digital conversion.

Type: Grant

Filed: December 14, 2021

Date of Patent: November 7, 2023

Assignee: International Business Machines Corporation

Inventors: Kyu-hyoun Kim, Mingu Kang, Ankur Agrawal, Monodeep Kar
Identifying checksum mechanisms using linear equations

Patent number: 11797644

Abstract: Certain aspects of the present disclosure provide techniques for detecting errors in account numbers. One example method generally includes receiving, from a user device, an entered number associated with a user and determining, based on a first portion of the entered number, an entity associated with the entered number. The method further includes obtaining, from an account number database, a plurality of account numbers associated with the entity and generating, from the plurality of account numbers, an account number matrix. The method further includes attempting to solve a multiplication equation of the account number matrix, wherein a solution of the multiplication equation is a vector of constants, upon determining a solution to the multiplication equation, determining whether the entered vector is a valid number for the entity and upon determining the entered vector is a valid number for the entity, storing the entered number in the account number database.

Type: Grant

Filed: May 11, 2021

Date of Patent: October 24, 2023

Assignee: INTUIT, INC.

Inventors: Yair Horesh, Yehezkel S. Resheff, Shimon Shahar, Noah Eyal Altman
Generalized acceleration of matrix multiply accumulate operations

Patent number: 11797302

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: June 17, 2021

Date of Patent: October 24, 2023

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Generalized acceleration of matrix multiply accumulate operations

Patent number: 11797303

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: June 17, 2021

Date of Patent: October 24, 2023

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Generalized acceleration of matrix multiply accumulate operations

Patent number: 11797301

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: January 4, 2021

Date of Patent: October 24, 2023

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Systems and methods for modifying neural networks for binary processing applications

Patent number: 11790241

Abstract: In one embodiment, a method of simulating an operation of an artificial neural network on a binary neural network processor includes receiving a binary input vector for a layer including a probabilistic binary weight matrix and performing vector-matrix multiplication of the input vector with the probabilistic binary weight matrix, wherein the multiplication results are modified by simulated binary-neural-processing hardware noise, to generate a binary output vector, where the simulation is performed in the forward pass of a training algorithm for a neural network model for the binary-neural-processing hardware.

Type: Grant

Filed: September 9, 2020

Date of Patent: October 17, 2023

Assignee: QUALCOMM Incorporated

Inventors: Matthias Reisser, Saurabh Kedar Pitre, Xiaochun Zhu, Edward Harrison Teague, Zhongze Wang, Max Welling
Circuit system for weight modulation and image recognition of memristor array

Patent number: 11784659

Abstract: A circuit system for weight modulation and image recognition of a memristor array includes a personal computer (PC), a field-programmable gate array (FPGA) chip, a digital-to-analog conversion unit, a switch unit, a memristor array unit, an integration and signal amplification circuit, and an analog-to-digital converter. The circuit system selects a to-be-realized function such as array reading and writing, weight modulation or image recognition, converts a command or an RGB value of an image collected by the PC into a corresponding grayscale value, and sends the grayscale value to the FPGA chip. The FPGA chip controls and selects a to-be-modulated memristor array unit through the digital-to-analog conversion unit and the switch unit. An application program of the PC controls the FPGA chip in real time to realize array reading and writing, weight modulation, and image recognition, and then the FPGA chip displays a result on the PC in real time.

Type: Grant

Filed: February 18, 2022

Date of Patent: October 10, 2023

Assignee: Hebei University

Inventors: Xiaobing Yan, Ziliang Fang, Saibo Yin
Execution or write mask generation for data selection in a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11782710

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.

Type: Grant

Filed: September 13, 2021

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Systems and methods for performing matrix compress and decompress instructions

Patent number: 11748103

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: February 15, 2022

Date of Patent: September 5, 2023

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
Permuting in a matrix-vector processor

Patent number: 11748443

Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.

Type: Grant

Filed: March 22, 2021

Date of Patent: September 5, 2023

Assignee: Google LLC

Inventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling

Patent number: 11727527

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.

Type: Grant

Filed: December 3, 2021

Date of Patent: August 15, 2023

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
Extensible multi-precision data pipeline for computing non-linear and arithmetic functions in artificial neural networks

Patent number: 11687336

Abstract: An extensible multi-precision data pipeline system, comprising, a local buffer that stores an input local data set in a local storage format, an input tensor shaper coupled to the local buffer that reads the input local data set and converts the input local data set into an input tensor data set having a tensor format of vector width N by tensor length L, a cascaded pipeline coupled to the input tensor shaper that routes the input tensor data set through at least one function stage resulting in an output tensor data set, an output tensor shaper coupled to the cascaded pipeline that converts the output tensor data set into an output local data set having the local storage format and wherein the output tensor shaper writes the output local data set to the local buffer.

Type: Grant

Filed: May 8, 2020

Date of Patent: June 27, 2023

Assignee: Black Sesame Technologies Inc.

Inventors: Yi Wang, Zheng Qi, Hui Wang, Zheng Li
Method, product, and apparatus for a machine learning process using dynamic rearrangement of sparse data and corresponding weights

Patent number: 11651283

Abstract: An approach is described for a method, product, and apparatus for a machine learning process using dynamic rearrangement of sparse data and corresponding weights. This approach includes a method, product, and apparatus for dynamically rearranging input data to move sparse data to a location such that computations on the sparse data might be avoided when executing a machine learning processing job. For example, sparse data within each row of the input matrix can be moved to the end of each corresponding row. When the input data is folded to fit the array, that sparse data might be at least partially contained within a fold that comprises only sparse data and possibly filler data. In such an event, computations on the fold are unnecessary and are avoided. In some embodiments, the approach includes dynamically rearranging a weight matrix to maintain a correspondence between the input data and the weights.

Type: Grant

Filed: June 30, 2020

Date of Patent: May 16, 2023

Assignee: Cadence Design Systems, Inc.

Inventors: Yong Liu, Ngai Ngai William Hung, Michael Patrick Zimmer
Systems and methods to zero a tile register pair

Patent number: 11645077

Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.

Type: Grant

Filed: June 1, 2021

Date of Patent: May 9, 2023

Assignee: Intel Corporation

Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Eyal Hadas
Reducing processing requirements to correct for bias in ratings data having interdependencies among demographic statistics

Patent number: 11645665

Abstract: Examples apparatus disclosed herein are to determine a plurality of weights based on a data structure having elements corresponding to pairings of ones of a plurality of demographic partition statistics and ones of a plurality of baseline demographic statistics obtained for a target population, the demographic partition statistics corresponding to a plurality of demographic partitions of a sample population, a first element of the data structure to combine a first one of the demographic partition statistics with a first one of the baseline demographic statistics of the target population based on a first value corresponding to a numerator term of an expression and a second value corresponding to a denominator term of the expression, the weights corresponding respectively to the demographic partitions of the sample population. Disclosed example apparatus are also to adjust the attribute data based on the weights to determine ratings data for the target population.

Type: Grant

Filed: August 10, 2020

Date of Patent: May 9, 2023

Assignee: THE NIELSEN COMPANY (US), LLC

Inventors: Michael Sheppard, Jonathan Sullivan, Alejandro Terrazas, Peter Lipa, Albert Ronald Perez
Scalable matrix computation circuit

Patent number: 11593455

Abstract: A scalable matrix computation circuit and methods for using the same are disclosed. In one embodiment, a matrix computation circuit includes a plurality of first operand memory configured to store a first set of input operands of the matrix computation circuit, a plurality of second operand memory configured to store a second set of input operands of the matrix computation circuit, where the first and second sets of input operands are programmable by the controller, a plurality of multiplier circuits arranged in a plurality of rows and plurality of columns, where each row receives a corresponding operand from the first set of operands, and each column receives a corresponding operand from the second set of operands, and the each corresponding operand from the each row is used multiple times by the multiplier circuits in that row to perform multiplications controlled by the controller, and a plurality of aggregator circuits configured to store charges produced by the plurality of multiplier circuits.

Type: Grant

Filed: July 7, 2020

Date of Patent: February 28, 2023

Assignee: Ambient Scientific, Inc.

Inventor: Gajendra Prasad Singh
Resistive matrix computation circuit

Patent number: 11593456

Abstract: A resistive matrix computation circuit and methods for using the same are disclosed.

Type: Grant

Filed: July 7, 2020

Date of Patent: February 28, 2023

Assignee: Ambient Scientific, Inc.

Inventor: Gajendra Prasad Singh

1 2 3 4 5 … next