Multiplication Of Matrices Patents (Class 708/607)
  • Patent number: 12260214
    Abstract: A compute channel can have multiple computational circuit blocks coupled in series to form a pipeline. The compute channel can perform a computation on an input tensor to generate an output tensor based on an instruction. When the computational does not require all of the computational circuit blocks, the throughput of the compute channel can be increased by splitting the data elements of the input tensor into multiple input data streams. The multiple input data streams are provided to respective subsets of one or more computational circuit blocks in the pipeline using bypass circuitry of the computational circuit blocks, and the computation can be performed on multiple input data streams in the respective subsets of one or more computational circuit blocks to generate multiple output data streams corresponding to the output tensor.
    Type: Grant
    Filed: September 30, 2022
    Date of Patent: March 25, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Paul Gilbert Meyer, Ron Diamant, Sundeep Amirineni, Sunil Kumar Bathula
  • Patent number: 12229215
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: October 16, 2023
    Date of Patent: February 18, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Patent number: 12223011
    Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.
    Type: Grant
    Filed: November 27, 2023
    Date of Patent: February 11, 2025
    Assignee: MIPS Holding, Inc.
    Inventor: David John Simpson
  • Patent number: 12175246
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: September 1, 2023
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 12164882
    Abstract: A memory circuit includes a selection circuit, a column of memory cells, and an adder tree. The selection circuit is configured to receive input data elements, each input data element including a number of bits equal to H, and output a selected set of kth bits of the H bits of the input data elements. Each memory cell of the column of memory cells includes a first storage unit configured to store a first weight data element and a first multiplier configured to generate a first product data element based on the first weight data element and a first kth bit of the selected set of kth bits. The adder tree is configured to generate a summation data element based on each of the first product data elements.
    Type: Grant
    Filed: March 16, 2021
    Date of Patent: December 10, 2024
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.
    Inventors: Yu-Der Chih, Hidehiro Fujiwara, Yi-Chun Shih, Po-Hao Lee, Yen-Huei Chen, Chia-Fu Lee, Jonathan Tsung-Yung Chang
  • Patent number: 12153974
    Abstract: An arithmetic apparatus includes input line pairs and a multiply-accumulate device. A signal pair is input to the input line pairs within an input period. The multiply-accumulate device includes multiplication units, an accumulation unit, a charging unit, and an output unit. The multiplication units generate a positive weight charge and a negative weight charge. The accumulation unit accumulates the positive weight charge and the negative weight charge. The charging unit charges the accumulation unit after the input period. The output unit performs, after charging starts, threshold determination using a predetermined threshold value on a voltage of the accumulation unit, to thereby output a positive multiply-accumulate signal representing a sum of positive weight product values and a negative multiply-accumulate signal representing a sum of negative weight product values.
    Type: Grant
    Filed: March 12, 2020
    Date of Patent: November 26, 2024
    Assignee: Sony Group Corporation
    Inventor: Hiroshi Yoshida
  • Patent number: 12153899
    Abstract: An apparatus and method for complex matrix transpose and multiply.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: November 26, 2024
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Daniel Towner, Amit Gradstein, Mark Jay Charney
  • Patent number: 12141229
    Abstract: One embodiment sets forth a technique for performing one or more matrix multiplication operations based on a first matrix and a second matrix. The technique includes receiving data associated with the first matrix from a first traversal engine that accesses nonzero elements included in the first matrix via a first tree structure. The technique also includes performing one or more computations on the data associated with the first matrix and the data associated with the second matrix to produce a plurality of partial results. The technique further includes combining the plurality of partial results into one or more intermediate results and storing the one or more intermediate results in a first buffer memory.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: November 12, 2024
    Assignee: NVIDIA Corporation
    Inventors: Hanrui Wang, James Michael O'Connor, Donghyuk Lee
  • Patent number: 12130886
    Abstract: Methods and systems are disclosed to reduce the time and memory complexities associated with automatic differentiation of tensor models. The disclosed embodiment consists of a tensor contraction gradient calculator (TCGC) method, a tensor automatic differentiation (TAD) method and a TAD system. The disclosed embodiment eliminates the need to compute partial derivatives or Jacobians for computing tensor gradients of tensor contractions and tensor models. The disclosed embodiment computes tensor gradients of any arbitrary tensor model automatically with both memory and time complexities asymptotically equal to those of the evaluation of tensor models that are theoretically the lowest achievable complexities.
    Type: Grant
    Filed: January 24, 2024
    Date of Patent: October 29, 2024
    Inventor: Mohammad Solgi
  • Patent number: 12093342
    Abstract: A dynamic bias analog vector-matrix multiplication operation circuit comprises: positive value weight columns (101-10N), constant columns (201-20M) and subtractors (301-30N), wherein the number of the subtractors is equal to the number of the positive value weight columns, the subtractors are correspondingly connected to the positive value weight columns on a one-to-one basis, and the number of the constant columns is less than the number of the positive value weight columns; minuend input ends of the subtractors are correspondingly connected to output ends of the positive value weight columns, subtrahend input ends of a plurality of subtractors are connected to the same constant column, and output ends thereof output operation results. Before a weight is written in a programmable semiconductor device, a constant positive value is added to each element in a weight array, the weight array is written in a positive value weight column, and the constant positive value is written in a constant column.
    Type: Grant
    Filed: April 3, 2019
    Date of Patent: September 17, 2024
    Assignee: BELJING ZHICUN (WITIN) TECHNOLOGY CORPORATION LIMITED
    Inventor: Shaodi Wang
  • Patent number: 12072953
    Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.
    Type: Grant
    Filed: June 16, 2021
    Date of Patent: August 27, 2024
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
  • Patent number: 12067401
    Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: August 20, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
  • Patent number: 12056489
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Grant
    Filed: May 5, 2023
    Date of Patent: August 6, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 12045308
    Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.
    Type: Grant
    Filed: December 16, 2022
    Date of Patent: July 23, 2024
    Assignee: Intel Corporation
    Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
  • Patent number: 12038326
    Abstract: Aspects of the present disclosure include methods for spectrally resolving light from fluorophores having overlapping fluorescence spectra in a sample. Methods according to certain embodiments include detecting light with a light detection system from a sample having a plurality of fluorophores having overlapping fluorescence spectra and spectrally resolving light from each fluorophore in the sample. In some embodiments, methods include estimating the abundance of one or more of the fluorophores in the sample, such as on a particle. In certain instances, methods include identifying the particle in the sample based on the abundance of each fluorophore and sorting the particle. Methods according to some embodiments includes spectrally resolving the light from each fluorophore by calculating a spectral unmixing matrix for the fluorescence spectra of each fluorophore. Systems and integrated circuit devices (e.g., a field programmable gate array) for practicing the subject methods are also provided.
    Type: Grant
    Filed: October 27, 2022
    Date of Patent: July 16, 2024
    Assignee: BECTON, DICKINSON AND COMPANY
    Inventors: Peter Mage, Keegan Owsley
  • Patent number: 12001508
    Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.
    Type: Grant
    Filed: October 23, 2023
    Date of Patent: June 4, 2024
    Assignee: Persimmons, Inc.
    Inventor: James Michael Bodwin
  • Patent number: 11989257
    Abstract: An apparatus includes a processor and a memory to store instructions. The instructions, when executed by the processor, cause the processor to perform threading of a first matrix along a first dimension of the first matrix and a second dimension of the matrix. The threading represents block sizes of the first matrix to assign to process threads of a multiplication algorithm to determine a third matrix that represents a product of the first matrix and a second matrix. The block sizes include a first block size along the first dimension and a second block size along the second dimension. The second matrix shares the second dimension with the first matrix. The instructions, when executed by the processor, cause the processor to provide data to the multiplication algorithm, which represents the first block size and the second block size.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: May 21, 2024
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Aaron M. Collier
  • Patent number: 11941078
    Abstract: Performing set operations using sparse matrix operations offered by a multi-core processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.
    Type: Grant
    Filed: September 30, 2022
    Date of Patent: March 26, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Ritwik Das
  • Patent number: 11934481
    Abstract: Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.
    Type: Grant
    Filed: April 20, 2022
    Date of Patent: March 19, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Hu Liu, Heng Liao, Jiajin Tu, Honghui Yuan, Hou Fun Lam, Fan Zhu
  • Patent number: 11928177
    Abstract: Methods and apparatus for performing video processing matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. Exemplary embodiments described herein perform DCT matrix-matrix multiplication operations within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one embodiment, matrix-matrix multiplication operations are obtained using separate matrix-vector products. In one exemplary embodiment, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage.
    Type: Grant
    Filed: September 19, 2022
    Date of Patent: March 12, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Fa-Long Luo
  • Patent number: 11922131
    Abstract: A method for performing vector-matrix multiplication may include converting a digital input vector comprising a plurality of binary-encoded values into a plurality of analog signals using a plurality of one-bit digital to analog converters (DACs); sequentially performing, using an analog vector matrix multiplier and based on bit-order, vector-matrix multiplication operations using a weighting matrix for the plurality of analog signals to generate analog outputs of the analog vector matrix multiplier; sequentially performing an analog-to-digital (ADC) operation on the analog outputs of the analog vector matrix multiplier to generate binary partial output vectors; and combining the binary partial output vectors to generate a result of the vector-matrix multiplication.
    Type: Grant
    Filed: November 7, 2020
    Date of Patent: March 5, 2024
    Assignee: Applied Materials, Inc.
    Inventors: Xiaofeng Zhang, She-Hwa Yen
  • Patent number: 11922021
    Abstract: Data employed in computations is processed so that during computations more of the data can be fit into or maintained in a smaller but higher speed memory than an original source of the data. More specifically, a sensitivity value is determined for various items of the data which reflect the number of bits in the data items that are not garbage bits, and only information in the data items that are indicated by the sensitivity value to not be garbage bits are necessarily effectively retained. At least the information that is not garbage bits and the corresponding associated sensitivity are packed together. The results of computations that are performed using the data items as at least one of the operands for the computation are associated with a sensitivity that is derived from the individual sensitivities of the operands used in the computation.
    Type: Grant
    Filed: December 19, 2022
    Date of Patent: March 5, 2024
    Assignee: INTELLECTUAL PROPERTY SYSTEMS, LLC
    Inventors: Juan Guillermo Gonzalez, Santiago Andres Fonseca, Rafael Camilo Nunez
  • Patent number: 11900577
    Abstract: There is provided with a processing apparatus. A data holder holds at least some of data of a plurality of channels in a target layer among a plurality of layers. Each of a plurality of processors performs, in parallel, a product-sum operation using the data of one channel of the target layer and a coefficient corresponding to the target layer. A selector selects whether to perform first processing or second processing on the basis of information specifying processing in the target layer. The first processing includes inputting the data of one channel of the target layer into one of the plurality of processors. The second processing includes inputting the data of one channel of the target layer to the plurality of processors in parallel.
    Type: Grant
    Filed: June 22, 2021
    Date of Patent: February 13, 2024
    Assignee: CANON KABUSHIKI KAISHA
    Inventors: Tsewei Chen, Masami Kato, Shiori Wakino
  • Patent number: 11899745
    Abstract: Disclosed herein includes a system, a method, and a device for processing and converting data using matrix operations. Circuitry can partition an input of a first data format across a plurality of lookup tables each residing in a respective memory. The circuitry can access weight information from a load store memory, and the partitioned input on a per column basis from the plurality of lookup tables. The circuitry can perform a number of multiply-accumulate (MAC) operations per cycle between the weight information from the load store memory and the partitioned input read on a per column basis from the plurality of lookup tables. The number of MAC operations performed per cycle can correspond to a total number of columns of the plurality of lookup tables. The circuitry can generate, responsive to the MAC operations on the partitioned input, a plurality of outputs in a second data format.
    Type: Grant
    Filed: August 19, 2020
    Date of Patent: February 13, 2024
    Assignee: Meta Platforms Technologies, LLC
    Inventors: Alagappan Valliappan, Ganesh Venkatesh, Pierce I-Jen Chuang
  • Patent number: 11886378
    Abstract: A processor includes an array of resistive processing units connected between row and column lines with a resistive element. A first single instruction, multiple data processing unit (SIMD) is connected to the row lines. A second SIMD is connected to the column lines. A first instruction issuer is connected to the first SIMD to issue instructions to the first SIMD, and a second instruction issuer is connected to the second SIMD to issue instructions to the second SIMD such that the processor is programmable and configurable for specific operations depending on an issued instruction set.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: January 30, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Tayfun Gokmen
  • Patent number: 11880426
    Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.
    Type: Grant
    Filed: July 31, 2022
    Date of Patent: January 23, 2024
    Inventor: David John Simpson
  • Patent number: 11860970
    Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.
    Type: Grant
    Filed: June 15, 2022
    Date of Patent: January 2, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Leijun He, Bin Xu, Kaixing Wang
  • Patent number: 11853385
    Abstract: Methods and apparatus for performing diversity matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for spatial diversity-related matrix transformations and performing matrix operations therein. Exemplary embodiments described herein perform MIMO-related matrix transformations (e.g., precoding, beamforming, or data recovery matrix operations) within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one variant, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage. The resulting signals can be converted from analog voltages to a digital values by an MMU to yield a matrix-vector product.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: December 26, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Fa-Long Luo
  • Patent number: 11853717
    Abstract: Embodiments of the present disclosure include systems and methods for accelerating processing based on sparsity for neural network hardware processors. An input manager determines a pair of non-zero values from a pair of data streams in a plurality of pairs of data streams and retrieve the pair of non-zero values from the pair of data streams. A multiplier performs a multiplication operation on the pair of non-zero values and generate a product of the pair of non-zero values. An accumulator manager receives the product of the pair of non-zero values from the multiplier and sends the product of the pair of non-zero values to a corresponding accumulator in a plurality of accumulators.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: December 26, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Karthikeyan Avudaiyappan, Jeffrey Andrews
  • Patent number: 11853386
    Abstract: The invention relates to a method for rapidly calculating a three-dimensional polarimetric dimension, including: determining that an incident light field is a coherence matrix of a partially coherent Schell-model beam, and decomposing the coherence matrix into a form of multiplying an incident electric field by a coherence structure matrix of the incident light field; obtaining an electric field near a focal field after the incident electric field passes through a tight focusing system according to the vector diffraction theory, and describing a second-order correlation characteristic of a partially coherent vector beam near a tightly focused field by using a coherence matrix; obtaining a tightly focused polarization matrix based on the tightly focused coherence matrix; and rotating the tightly focused polarization matrix into an intrinsic coordinate frame of the tightly focused polarization matrix, and calculating a three-dimensional polarimetric dimension of the partially coherent Schell-model beam in the t
    Type: Grant
    Filed: February 11, 2022
    Date of Patent: December 26, 2023
    Assignee: SOOCHOW UNIVERSITY
    Inventors: Yahong Chen, Chencheng Yan, Fei Wang, Yangjian Cai
  • Patent number: 11847185
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 11847106
    Abstract: The disclosure is directed to various ways of improving the functioning of computer systems, information networks, data stores, search engine systems and methods, and other advantages. Among other things, provided herein are methods, systems, components, processes, modules, blocks, circuits, sub-systems, articles, and other elements (collectively referred to in some cases as the “platform” or the “system”) that collectively enable, in one or more datastores (e.g., where each datastore may include one or more databases) and systems, the creation, development, maintenance, and use of a set of custom objects for use in a wide range of activities, including sales activities, marketing activities, service activities, content development activities, and others, as well as improved methods and systems for sales, marketing and services that make use of such entity resolution systems and methods as well as custom objects.
    Type: Grant
    Filed: May 12, 2021
    Date of Patent: December 19, 2023
    Assignee: HUBSPOT, INC.
    Inventors: Hector Urdiales, Marco Lagi, Stephen J. Purcell, Stuart P. Layton, Bryan Ash, Jared Williams, Sophie Higgs, Robert McEneaney, Dylan Sellberg, Anna Perko
  • Patent number: 11830543
    Abstract: A memory circuit includes a first memory array including first memory cells wherein a plurality of first word lines is coupled with a plurality of rows of first memory cells in a first segment of the first memory array, and a plurality of second word lines is coupled with the plurality of rows of first memory cells in a second segment of the first memory array. The memory circuit also includes a read circuit configured to retrieve data from the first memory cells of the first memory array and a computation circuit configured to perform a matrix computation by combining first data retrieved from the first memory cells of the first segment with second data retrieved from the first memory cells of the second segment.
    Type: Grant
    Filed: June 23, 2022
    Date of Patent: November 28, 2023
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.
    Inventors: Yen-Huei Chen, Hidehiro Fujiwara, Hung-Jen Liao, Jonathan Tsung-Yung Chang
  • Patent number: 11811416
    Abstract: An apparatus comprises at least one processor and at least one memory including instruction code configured to, with the at least one processor, cause the apparatus at least to perform a successive approximation analog-to-digital conversion of an analog input, representing a result of multiplication of first and second vectors, to a digital output by determining an upper bound on the result of multiplication of the first and second vectors, identifying, based at least in part on the determined upper bound, at least a portion of the successive approximation analog-to-digital conversion to be skipped, and skipping the identified portion of the successive approximation analog-to-digital conversion.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: November 7, 2023
    Assignee: International Business Machines Corporation
    Inventors: Kyu-hyoun Kim, Mingu Kang, Ankur Agrawal, Monodeep Kar
  • Patent number: 11797302
    Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: October 24, 2023
    Assignee: NVIDIA Corporation
    Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
  • Patent number: 11797644
    Abstract: Certain aspects of the present disclosure provide techniques for detecting errors in account numbers. One example method generally includes receiving, from a user device, an entered number associated with a user and determining, based on a first portion of the entered number, an entity associated with the entered number. The method further includes obtaining, from an account number database, a plurality of account numbers associated with the entity and generating, from the plurality of account numbers, an account number matrix. The method further includes attempting to solve a multiplication equation of the account number matrix, wherein a solution of the multiplication equation is a vector of constants, upon determining a solution to the multiplication equation, determining whether the entered vector is a valid number for the entity and upon determining the entered vector is a valid number for the entity, storing the entered number in the account number database.
    Type: Grant
    Filed: May 11, 2021
    Date of Patent: October 24, 2023
    Assignee: INTUIT, INC.
    Inventors: Yair Horesh, Yehezkel S. Resheff, Shimon Shahar, Noah Eyal Altman
  • Patent number: 11797301
    Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: October 24, 2023
    Assignee: NVIDIA Corporation
    Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
  • Patent number: 11797303
    Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: October 24, 2023
    Assignee: NVIDIA Corporation
    Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
  • Patent number: 11790241
    Abstract: In one embodiment, a method of simulating an operation of an artificial neural network on a binary neural network processor includes receiving a binary input vector for a layer including a probabilistic binary weight matrix and performing vector-matrix multiplication of the input vector with the probabilistic binary weight matrix, wherein the multiplication results are modified by simulated binary-neural-processing hardware noise, to generate a binary output vector, where the simulation is performed in the forward pass of a training algorithm for a neural network model for the binary-neural-processing hardware.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: October 17, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Matthias Reisser, Saurabh Kedar Pitre, Xiaochun Zhu, Edward Harrison Teague, Zhongze Wang, Max Welling
  • Patent number: 11784659
    Abstract: A circuit system for weight modulation and image recognition of a memristor array includes a personal computer (PC), a field-programmable gate array (FPGA) chip, a digital-to-analog conversion unit, a switch unit, a memristor array unit, an integration and signal amplification circuit, and an analog-to-digital converter. The circuit system selects a to-be-realized function such as array reading and writing, weight modulation or image recognition, converts a command or an RGB value of an image collected by the PC into a corresponding grayscale value, and sends the grayscale value to the FPGA chip. The FPGA chip controls and selects a to-be-modulated memristor array unit through the digital-to-analog conversion unit and the switch unit. An application program of the PC controls the FPGA chip in real time to realize array reading and writing, weight modulation, and image recognition, and then the FPGA chip displays a result on the PC in real time.
    Type: Grant
    Filed: February 18, 2022
    Date of Patent: October 10, 2023
    Assignee: Hebei University
    Inventors: Xiaobing Yan, Ziliang Fang, Saibo Yin
  • Patent number: 11782710
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
    Type: Grant
    Filed: September 13, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11748443
    Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.
    Type: Grant
    Filed: March 22, 2021
    Date of Patent: September 5, 2023
    Assignee: Google LLC
    Inventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark
  • Patent number: 11748103
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: February 15, 2022
    Date of Patent: September 5, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 11727527
    Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
    Type: Grant
    Filed: December 3, 2021
    Date of Patent: August 15, 2023
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
  • Patent number: 11687336
    Abstract: An extensible multi-precision data pipeline system, comprising, a local buffer that stores an input local data set in a local storage format, an input tensor shaper coupled to the local buffer that reads the input local data set and converts the input local data set into an input tensor data set having a tensor format of vector width N by tensor length L, a cascaded pipeline coupled to the input tensor shaper that routes the input tensor data set through at least one function stage resulting in an output tensor data set, an output tensor shaper coupled to the cascaded pipeline that converts the output tensor data set into an output local data set having the local storage format and wherein the output tensor shaper writes the output local data set to the local buffer.
    Type: Grant
    Filed: May 8, 2020
    Date of Patent: June 27, 2023
    Assignee: Black Sesame Technologies Inc.
    Inventors: Yi Wang, Zheng Qi, Hui Wang, Zheng Li
  • Patent number: 11651283
    Abstract: An approach is described for a method, product, and apparatus for a machine learning process using dynamic rearrangement of sparse data and corresponding weights. This approach includes a method, product, and apparatus for dynamically rearranging input data to move sparse data to a location such that computations on the sparse data might be avoided when executing a machine learning processing job. For example, sparse data within each row of the input matrix can be moved to the end of each corresponding row. When the input data is folded to fit the array, that sparse data might be at least partially contained within a fold that comprises only sparse data and possibly filler data. In such an event, computations on the fold are unnecessary and are avoided. In some embodiments, the approach includes dynamically rearranging a weight matrix to maintain a correspondence between the input data and the weights.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: May 16, 2023
    Assignee: Cadence Design Systems, Inc.
    Inventors: Yong Liu, Ngai Ngai William Hung, Michael Patrick Zimmer
  • Patent number: 11645665
    Abstract: Examples apparatus disclosed herein are to determine a plurality of weights based on a data structure having elements corresponding to pairings of ones of a plurality of demographic partition statistics and ones of a plurality of baseline demographic statistics obtained for a target population, the demographic partition statistics corresponding to a plurality of demographic partitions of a sample population, a first element of the data structure to combine a first one of the demographic partition statistics with a first one of the baseline demographic statistics of the target population based on a first value corresponding to a numerator term of an expression and a second value corresponding to a denominator term of the expression, the weights corresponding respectively to the demographic partitions of the sample population. Disclosed example apparatus are also to adjust the attribute data based on the weights to determine ratings data for the target population.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: May 9, 2023
    Assignee: THE NIELSEN COMPANY (US), LLC
    Inventors: Michael Sheppard, Jonathan Sullivan, Alejandro Terrazas, Peter Lipa, Albert Ronald Perez
  • Patent number: 11645077
    Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.
    Type: Grant
    Filed: June 1, 2021
    Date of Patent: May 9, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Eyal Hadas
  • Patent number: 11593455
    Abstract: A scalable matrix computation circuit and methods for using the same are disclosed. In one embodiment, a matrix computation circuit includes a plurality of first operand memory configured to store a first set of input operands of the matrix computation circuit, a plurality of second operand memory configured to store a second set of input operands of the matrix computation circuit, where the first and second sets of input operands are programmable by the controller, a plurality of multiplier circuits arranged in a plurality of rows and plurality of columns, where each row receives a corresponding operand from the first set of operands, and each column receives a corresponding operand from the second set of operands, and the each corresponding operand from the each row is used multiple times by the multiplier circuits in that row to perform multiplications controlled by the controller, and a plurality of aggregator circuits configured to store charges produced by the plurality of multiplier circuits.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: February 28, 2023
    Assignee: Ambient Scientific, Inc.
    Inventor: Gajendra Prasad Singh
  • Patent number: 11593456
    Abstract: A resistive matrix computation circuit and methods for using the same are disclosed.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: February 28, 2023
    Assignee: Ambient Scientific, Inc.
    Inventor: Gajendra Prasad Singh