Patent number: 11989259Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: November 10, 2022Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi

Patent number: 11989258Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: GrantFiled: November 9, 2020Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi

Patent number: 11989257Abstract: An apparatus includes a processor and a memory to store instructions. The instructions, when executed by the processor, cause the processor to perform threading of a first matrix along a first dimension of the first matrix and a second dimension of the matrix. The threading represents block sizes of the first matrix to assign to process threads of a multiplication algorithm to determine a third matrix that represents a product of the first matrix and a second matrix. The block sizes include a first block size along the first dimension and a second block size along the second dimension. The second matrix shares the second dimension with the first matrix. The instructions, when executed by the processor, cause the processor to provide data to the multiplication algorithm, which represents the first block size and the second block size.Type: GrantFiled: October 29, 2020Date of Patent: May 21, 2024Assignee: Hewlett Packard Enterprise Development LPInventor: Aaron M. Collier

Patent number: 11983631Abstract: A computer determines a solution to a nonlinear optimization problem. A conjugate gradient (CG) iteration is performed with a first order derivative vector and a second order derivative matrix to update a CG residual vector, an Hconjugate vector, and a residual weight vector. A CG solution vector is updated using a previous CG solution vector, the Hconjugate vector, and the residual weight vector. An eigenvector of the second order derivative matrix having a smallest eigenvalue is computed. A basis matrix is defined that includes a cubic regularization (CR) solution vector, a CR residual vector, the CG solution vector, the CG residual vector, and the eigenvector. A CR iteration is performed to update the CR solution vector. The CR residual vector is updated using the first order derivative vector, the second order derivative matrix, and the updated CR solution vector. The process is repeated until a stop criterion is satisfied.Type: GrantFiled: November 16, 2023Date of Patent: May 14, 2024Assignee: SAS INSTITUTE INC.Inventors: Wenwen Zhou, Joshua David Griffin, Riadh Omheni, Seyedalireza Yektamaram, Yan Xu

Patent number: 11966857Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an onchip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a perelement basis via the one or more lookup tables, and stream post processing result of the perelement tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.Type: GrantFiled: April 6, 2021Date of Patent: April 23, 2024Assignee: Marvell Asia Pte LtdInventors: Avinash Sodani, Ulf Hanebutte, ChiaHsin Chen

Patent number: 11954582Abstract: Disclosed is a neural network accelerator including a first bit operator generating a first multiplication result by performing multiplication on first feature bits of input feature data and first weight bits of weight data, a second bit operator generating a second multiplication result by performing multiplication on second feature bits of the input feature data and second weight bits of the weight data, an adder generating an addition result by performing addition based on the first multiplication result and the second multiplication result, a shifter shifting a number of digits of the addition result depending on a shift value to generate a shifted addition result, and an accumulator generating output feature data based on the shifted addition result.Type: GrantFiled: December 21, 2022Date of Patent: April 9, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Sungju Ryu, Hyungjun Kim, JaeJoon Kim

Patent number: 11947929Abstract: An arithmetic device includes a comparison unit comparing voltage generated with charge stored in a storage unit with a threshold, and outputting an output signal at a timing when the voltage exceeds the threshold, and a timing extension unit extending an interval between timings at each of which the output signal is output.Type: GrantFiled: July 4, 2019Date of Patent: April 2, 2024Assignee: SONY CORPORATIONInventor: Hiroyuki Yamagishi

Patent number: 11941078Abstract: Performing set operations using sparse matrix operations offered by a multicore processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.Type: GrantFiled: September 30, 2022Date of Patent: March 26, 2024Assignee: Microsoft Technology Licensing, LLCInventor: Ritwik Das

Patent number: 11934481Abstract: Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.Type: GrantFiled: April 20, 2022Date of Patent: March 19, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Hu Liu, Heng Liao, Jiajin Tu, Honghui Yuan, Hou Fun Lam, Fan Zhu

Patent number: 11934308Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing softwareinitiated work requests are accessed. A plurality of dimensions from a tensor is flattened into a single dimension. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. A direct memory access (DMA) engine coupled to the one or more processor clusters is configured. Addresses are generated based on the unique address space descriptors and the common address space descriptor. The plurality of dimensions can be summed to generate a single address. Memory is accessed using two or more of the addresses that were generated. The addresses are used to enable DMA access.Type: GrantFiled: September 29, 2020Date of Patent: March 19, 2024Inventors: David John Simpson, Stephen Curtis Johnson, Richard Douglas Trauben

Patent number: 11934965Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an onchip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a perelement basis via the one or more lookup tables, and stream post processing result of the perelement tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.Type: GrantFiled: April 6, 2021Date of Patent: March 19, 2024Assignee: Marvell Asia Pte LtdInventors: Avinash Sodani, Ulf Hanebutte, ChiaHsin Chen

Patent number: 11928442Abstract: A method related to posit tensor processing can include receiving, by a plurality of multiplyaccumulator (MAC) units coupled to one another, a plurality of universal number (unum) or posit bit strings organized in a matrix and to be used as operands in a plurality of respective recursive operations performed using the plurality of MAC units and performing, using the MAC units, the plurality of respective recursive operations. Iterations of the respective recursive operations are performed using at least one bit string that is a same bit string as was used in a preceding iteration of the respective recursive operations. The method can further include prior to receiving the plurality of unum or posit bit strings, performing an operation to organize the plurality of unum or posit bit strings to achieve a threshold bandwidth ratio, a threshold latency, or both during performance of the plurality of respective recursive operations.Type: GrantFiled: January 3, 2022Date of Patent: March 12, 2024Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh

Patent number: 11928177Abstract: Methods and apparatus for performing video processing matrix operations within a memory fabric. Various embodiments of the present disclosure are directed to converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. Exemplary embodiments described herein perform DCT matrixmatrix multiplication operations within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). In one embodiment, matrixmatrix multiplication operations are obtained using separate matrixvector products. In one exemplary embodiment, the matrix fabric uses a “crossbar” construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage.Type: GrantFiled: September 19, 2022Date of Patent: March 12, 2024Assignee: Micron Technology, Inc.Inventor: FaLong Luo

Patent number: 11921813Abstract: Embodiments relate to a computing system for solving differential equations. The system is configured to receive problem packages corresponding to problems to be solved, each comprising at least a differential equation and a domain, and to select a solver of a plurality of solvers, based upon availability of each of the plurality of solvers. A dispatch computer selects a solver by monitoring the plurality of solvers, and responsive to a solver becoming available, determines if a received problem package having at least a threshold priority level can be solved by the solver. Otherwise, the dispatch computer generates a plurality of solver scenarios each reflecting a permutation of received problem packages assigned to solvers estimated to become available within a threshold period of time, and assigns the problem packages in accordance with a solver scenario having a highest utilization score.Type: GrantFiled: August 10, 2020Date of Patent: March 5, 2024Assignee: VORTICITY INC.Inventor: Chirath Neranjena Thouppuarachchi

Patent number: 11921814Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of nonzero elements of the portion of the matrix; and identifying memory addresses of the nonzero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the nonzero elements in the first section into one or more vector registers.Type: GrantFiled: June 14, 2022Date of Patent: March 5, 2024Assignee: Alibaba Group Holding LimitedInventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie

Patent number: 11921848Abstract: The disclosed embodiments relate to a system that characterizes susceptibility of an inferential model to follow signal degradation. During operation, the system receives a set of timeseries signals associated with sensors in a monitored system during normal faultfree operation. Next, the system trains the inferential model using the set of timeseries signals. The system then characterizes susceptibility of the inferential model to follow signal degradation. During this process, the system adds degradation to a signal in the set of timeseries signals to produce a degraded signal. Next, the system uses the inferential model to perform prognosticsurveillance operations on the set of timeseries signals with the degraded signal. Finally, the system characterizes susceptibility of the inferential model to follow degradation in the signal based on results of the prognosticsurveillance operations.Type: GrantFiled: November 2, 2020Date of Patent: March 5, 2024Assignee: Oracle International CorporationInventors: Zexi Chen, Kenny C. Gross, Ashin George, Guang C. Wang

Patent number: 11914670Abstract: Methods and systems for compressing a matrix are described. The matrix, having a plurality of rows formed by a respective plurality of vectors, is partitioned into a plurality of submatrices, each submatrix containing subvectors from a respective group of one or more contiguous columns of the matrix. For each given submatrix, the subvectors are clustered into a plurality of clusters. For each given cluster, a centroid and a variance are computed and stored, based on the subvectors belonging to the given cluster. A mapping relating each vector to a respective cluster in each submatrix is stored. The stored centroids, stored variances and stored mapping form a set of compressed data for reconstruction of the matrix.Type: GrantFiled: September 8, 2020Date of Patent: February 27, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Krtin Kumar, Mehdi Rezagholizadeh, Peyman Passban

Patent number: 11915101Abstract: In one aspect, a method includes identifying (i) a computational problem that is a candidate for a quantum computation, and (ii) one or more numerical algorithms for solving the candidate computational problem; providing input task data identifying (i) the candidate computational problem, and (ii) the one or more numerical algorithms, to a numerical quantum experimentation system, wherein the numerical quantum experimentation system comprises multiple universal numerics workers, a universal numerics worker, of the multiple universal numerics workers being configured to solve the candidate computational problem using the one or more numerical algorithms; receiving, from the numerical quantum experimentation system, data representing results of the one or more numerical algorithms to solve the candidate computational problem; and determining whether the received data indicates that a quantum computation applied to the candidate computational problem has a greater efficacy at a solution than a classical computatType: GrantFiled: November 12, 2021Date of Patent: February 27, 2024Assignee: Google LLCInventor: Vasil S. Denchev

Patent number: 11907686Abstract: The present disclosure provides computing apparatuses, methods and software for generating random numbers. Data is received from an instrument characterising macromolecules in a sample, the data including measurement event information relating to measurements of individual macromolecules recorded over time. For each measurement event in a sequence of measurement events in the data, an event timing representative of the duration of event or the time passing between consecutive events is determined. This is compared with a comparator value to generate a binary output, and a bit value is determined based on the binary output. Data representative of a random number is generated by assembling a vector of bit values determined from the event timings in sequence. The determined sequence of event timings for the sequence of measurement events represents a source of entropy extracted by the comparison step to generate the random number.Type: GrantFiled: August 11, 2023Date of Patent: February 20, 2024Assignee: Veiovia LimitedInventors: Darren HurleySmith, Alastair Droop, Remy Lyon, Roxana Iuliana Teodor

Patent number: 11907832Abstract: A method includes: providing input information in an electronic format; converting the electronic input information into an optical input vector; optically transforming the optical input vector into an optical output vector based on an optical matrix multiplication; converting the optical output vector into an electronic format; and electronically applying a nonlinear transformation to the electronically converted optical output vector to provide output information in an electronic format. For example, a set of input values are encoded on respective optical signals. For each of at least two subsets of optical signals, a copying module splits the subset into multiple copies of the optical signals. For each copy of a first subset of optical signals, a corresponding multiplication module multiplies the optical signals of the first subset by matrix element values using optical amplitude modulation. A summation module produces an electrical signal representing a sum of the results of the multiplication modules.Type: GrantFiled: April 20, 2020Date of Patent: February 20, 2024Assignee: Lightelligence PTE. Ltd.Inventors: Yichen Shen, Huaiyu Meng, Li Jing, Rumen Dangovski, Peng Xie, Matthew Khoury, ChengKuan Lu, Ronald Gagnon, Maurice Steinman, Jianhua Wu, Arash Hosseinzadeh