Matrix Array Patents (Class 708/520)
-
Patent number: 12236338Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a matrix multiplication of a first tensor and a second tensor to obtain one or more intermediate results. The second tensor includes an adjusted weight tensor created using a multiplier. Values of a bias tensor are added to the one or more intermediate results to obtain one or more results for the combined function. The one or more results are at least a part of an output tensor.Type: GrantFiled: June 17, 2021Date of Patent: February 25, 2025Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
-
Patent number: 12204606Abstract: In some examples, a system can store a first array, which is a one-dimensional array of values (e.g., matrix values), in memory. The system can also store a second array in the memory, where the second array is a one-dimensional array of pointers that point to positions of a subset of the values in the first array. The subset of values can be a first entry of each row or column of a matrix. The system can then provide the second array as input to a program routine, which can perform a matrix operation. To do so, the program routine can access the first array and the second array in memory, select a set of values for the matrix from the first array by using the pointers, execute the matrix operation using the using the selected set of values, and output the result.Type: GrantFiled: August 2, 2024Date of Patent: January 21, 2025Assignee: SAS INSTITUTE INC.Inventor: Alexander Vladimirovich Andrianov
-
Patent number: 12175374Abstract: A computing system trains a classification model using distributed training data. A first worker index and a second worker index are received from a controller device and together uniquely identify a segment of a lower triangular matrix. The first and second worker indices have values from one to a predefined block size value. In response to receipt of a first computation request from the controller device, a first kernel matrix block is computed at each computing device based on the first worker index and the second worker index. In response to receipt of a second computation request from the controller device, an objective function value is computed for each observation vector included in an accessed training data subset. The computed objective function value is sent to the controller device. Model parameters for a trained classification model are output.Type: GrantFiled: April 15, 2024Date of Patent: December 24, 2024Assignee: SAS Institute Inc.Inventors: Yingjian Wang, Xinmin Wu
-
Patent number: 12141226Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: GrantFiled: April 5, 2019Date of Patent: November 12, 2024Assignee: Expedera, Inc.Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Patent number: 12086205Abstract: Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.Type: GrantFiled: March 24, 2021Date of Patent: September 10, 2024Assignee: Intel CorporationInventors: Chunhui Mei, Hong Jiang, Jiasheng Chen, Yongsheng Liu, Yan Li
-
Patent number: 12045616Abstract: In some examples, a circuit includes an interface configured to couple to a memory that includes a set of outputs to provide a set of data from the memory. The circuit further includes a rotator coupled to the interface that includes a first set of multiplexors that each include a set of inputs coupled to the set of outputs of the interface and an output. The circuit further includes a storage circuit coupled to the rotator that includes a register file coupled to the outputs of the first set of multiplexors an alignment network. The alignment network includes a second set of multiplexors that each include a set of inputs coupled to the register file and an output.Type: GrantFiled: March 8, 2021Date of Patent: July 23, 2024Assignee: Texas Instruments IncorporatedInventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
-
Patent number: 12001484Abstract: Methods and systems for low-latency multi-constraint ranking of content items. One of the methods includes receiving a request to rank a plurality of content items for presentation to a user to maximize a primary objective subject to a plurality of constraints; initializing a dual variable vector; updating the dual variable vector, comprising: determining an overall objective score for the dual variable vector; identifying a plurality of candidate dual variable vectors that includes one or more neighboring node dual variable vectors; determining respective overall objective scores for each of the one or more candidate dual variable vectors; identifying the candidate with the best overall objective score; and determining whether to update the dual variable vector based on whether the identified candidate has a better overall objective score than the dual variable vector; and determining a final ranking for the content items based on the dual variable vector.Type: GrantFiled: February 16, 2021Date of Patent: June 4, 2024Assignee: DeepMind Technologies LimitedInventors: Timothy Arthur Mann, Ivan Lobov, Anton Zhernov, Krishnamurthy Dvijotham, Xiaohong Gong, Dan-Andrei Calian
-
Patent number: 11995149Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.Type: GrantFiled: December 17, 2020Date of Patent: May 28, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor
-
Patent number: 11921814Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.Type: GrantFiled: June 14, 2022Date of Patent: March 5, 2024Assignee: Alibaba Group Holding LimitedInventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
-
Patent number: 11914670Abstract: Methods and systems for compressing a matrix are described. The matrix, having a plurality of rows formed by a respective plurality of vectors, is partitioned into a plurality of submatrices, each submatrix containing sub-vectors from a respective group of one or more contiguous columns of the matrix. For each given submatrix, the sub-vectors are clustered into a plurality of clusters. For each given cluster, a centroid and a variance are computed and stored, based on the sub-vectors belonging to the given cluster. A mapping relating each vector to a respective cluster in each submatrix is stored. The stored centroids, stored variances and stored mapping form a set of compressed data for reconstruction of the matrix.Type: GrantFiled: September 8, 2020Date of Patent: February 27, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Krtin Kumar, Mehdi Rezagholizadeh, Peyman Passban
-
Patent number: 11907713Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.Type: GrantFiled: December 28, 2019Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
-
Patent number: 11907719Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.Type: GrantFiled: June 26, 2020Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
-
Patent number: 11899744Abstract: A neural network apparatus for performing a matrix multiplication operation includes a memory having at least one program stored therein and a processor to perform one or more operations by executing the at least one program. The processor can determine whether to divide an initial weight in one of a column direction and a row direction according to whether a reshape operation and a transpose operation are performed before or after a matrix multiplication operation and generate division weights by dividing the initial weight by a head count in the determined direction. Also, the processor can generate intermediate feature maps by performing a matrix multiplication operation between the input feature map and the division weights and generate a final feature map based on the intermediate feature maps.Type: GrantFiled: April 17, 2020Date of Patent: February 13, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Songyi Han, Hyunsun Park
-
Patent number: 11893079Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.Type: GrantFiled: September 29, 2021Date of Patent: February 6, 2024Assignee: NEC CORPORATIONInventors: Lu Feng, Chunchen Liu, Wenjuan Wei
-
Patent number: 11853387Abstract: A data sparse projection method, includes: randomly initializing a high-dimensional sparse two-dimensional matrix (S1); fixing the high-dimensional sparse two-dimensional matrix, and calculating an optimal output variable by using the high-dimensional sparse two-dimensional matrix (S2); fixing the optimal output variable, and calculating an optimal high-dimensional sparse two-dimensional matrix by using the optimal output variable (S3); and cyclically fixing the high-dimensional sparse two-dimensional matrix and the output variable until the optimal output variable is no longer increased when the high-dimensional sparse two-dimensional matrix is fixed (S4).Type: GrantFiled: April 12, 2023Date of Patent: December 26, 2023Assignee: THE CHINESE UNIVERSITY OF HONG KONG, SHENZHENInventors: Chonglin Gu, Changyi Ma, Wenye Li, Shuguang Cui
-
Patent number: 11836751Abstract: A method for measuring relatedness between prediction tasks includes receiving data for a first prediction task. The method further includes measuring the relatedness of the first prediction task to at least one previous prediction task as a difference between divergence of conditional probabilities of the tasks. The method can be advantageously applied in artificial intelligence or continual learning systems.Type: GrantFiled: March 3, 2020Date of Patent: December 5, 2023Assignee: NEC CORPORATIONInventors: Shujian Yu, Ammar Shaker
-
Patent number: 11836371Abstract: A storage system memory or memory domain with N memory controllers is organized into N-1 same-size partitions per memory controller or N partitions per memory controller with one partition reserved as spare capacity. The unreserved partitions are assigned to mirror pairs of members such that a first triangular submatrix of a representative matrix of indexed memory controllers and indexed partitions is a transpose of a second triangular submatrix of the representative matrix. The resulting distribution of members is balanced such that additional loading on remaining memory controllers when one of the memory controllers becomes inaccessible is evenly distributed.Type: GrantFiled: July 8, 2022Date of Patent: December 5, 2023Assignee: Dell Products L.P.Inventors: Kuolin Hua, Adnan Sahin
-
Patent number: 11823303Abstract: A data processing method and apparatus are disclosed. In various embodiments, R groups of proposal region sequences are obtained. Each group of proposal region sequence includes a plurality of proposal regions. In those embodiments, a VRPAC instruction is invoked to calculate an area of each proposal region in each group of proposal region sequence. For a jth group of proposal region sequence in the R groups of proposal region sequences, a VIOU instruction and a VAADD instruction are invoked to determine j suppression matrices of the jth group of proposal region sequence and determine a suppression vector of the jth group of proposal region sequence based on the j suppression matrices. In those embodiments, an unsuppressed proposal region is determined based on a suppression vector of each group of proposal region sequence.Type: GrantFiled: July 19, 2020Date of Patent: November 21, 2023Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Luping Cui, Jiajin Tu, Hu Liu, Honghui Yuan, Heng Liao, Hou Fun Lam, Bing Li
-
Patent number: 11797643Abstract: Embodiments of apparatus and method for matrix multiplication using processing-in-memory (PIM) are disclosed. In an example, an apparatus for matrix multiplication includes an array of tiles that each include one or more PIM blocks. A PIM block may include a hybrid-mode PIM block that may be configured into a digital mode or an analog mode. The PIM block configured into digital mode may perform operations associated with depth-wise (DW) convolution. On the other hand, a PIM block configured into analog mode may perform operations associated with point-wise (PW) convolution. A controller may be used to configure the PIM block into either digital mode or analog mode, depending on the computations.Type: GrantFiled: November 9, 2020Date of Patent: October 24, 2023Assignee: NEONEXUS PTE. LTD.Inventor: Qilin Zheng
-
Patent number: 11734386Abstract: A matrix processing method performed by a graphics processing unit (GPU) includes: determining a plurality of non-zero elements in a to-be-processed matrix at a processor in the GPU; generating a distribution matrix of the to-be-processed matrix at the processor, where the distribution matrix comprises identities for indicating positions of the plurality of non-zero elements in the to-be-processed matrix; obtaining a target matrix from another matrix by using the distribution matrix at a logic circuit in the processor, where the target matrix comprises a plurality of target elements from the another matrix; and performing matrix processing on the plurality of non-zero elements and the target matrix to obtain an operation result at the processor.Type: GrantFiled: December 23, 2021Date of Patent: August 22, 2023Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Zhenjiang Dong, Chio In Ieong, Hu Liu, Hai Chen
-
Patent number: 11734387Abstract: Techniques regarding an iterative energy-scaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a read-out component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.Type: GrantFiled: March 3, 2022Date of Patent: August 22, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia
-
Patent number: 11734383Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.Type: GrantFiled: July 29, 2020Date of Patent: August 22, 2023Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITEDInventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
-
Patent number: 11687616Abstract: An arithmetic processing apparatus includes a memory and a processor. The processor coupled to memory and configured to determine an individual not to be evolved to an individual of a second generation from among a plurality of individuals in a first generation based on a predetermined reference for calculation completion of fitness calculation for each of the plurality of individuals, the second generation being a generation next to the first generation, and determine to cause the determined individual to evolve to an individual of a generation next or subsequent to the second generation.Type: GrantFiled: November 6, 2020Date of Patent: June 27, 2023Assignee: FUJITSU LIMITEDInventors: Yukito Tsunoda, Teruo Ishihara
-
Patent number: 11609762Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.Type: GrantFiled: August 10, 2021Date of Patent: March 21, 2023Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Patent number: 11556852Abstract: A computer-implemented method for determining a set of target items to be annotated for training a machine learning application. The method comprises providing a training data set with a set of data samples and an auto-encoder with a classifier. The auto-encoder comprises an embedding model that maps the set of data samples to a set of compressed feature vectors. The set of compressed feature vectors define a compressed feature matrix. Further provided are: a definition of a graph associated to the compressed feature matrix, applying a clustering-algorithm to identify node clusters of the graph and applying a centrality algorithm to identify central nodes of the node clusters, retrieving from an annotator node labels for the central nodes, propagating the annotated node labels to other nodes of the graph and performing a training of the embedding model and the classifier with the annotated and the propagated node labels.Type: GrantFiled: March 6, 2020Date of Patent: January 17, 2023Assignee: International Business Machines CorporationInventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Ralf Kaestner, Alexander Velizhev, Dal Noguer Hidalgo, Rita Kuznetsova, Konstantinos Bekas
-
Patent number: 11550872Abstract: Quantum computing systems and methods are provided. In one example, a quantum computing system includes a quantum system having one or more quantum system qubits and one or more ancilla qubits. The quantum computing system includes one or more quantum gates implemented by the quantum computing system. The quantum gate(s) are operable to configure the one or more ancilla qubits into a known state. The quantum computing system includes a quantum measurement circuit operable to perform a plurality of measurements on the one or more quantum system qubits using the one or more ancilla qubits. The quantum computing system includes one or more processors operable to determine a reduced density matrix for a subset of the quantum system based on a set of the plurality of measurements that include a number of repeated measurements performed using the quantum measurement circuit.Type: GrantFiled: October 15, 2020Date of Patent: January 10, 2023Assignee: GOOGLE LLCInventor: Zhang Jiang
-
Patent number: 11520855Abstract: A computer-implemented method is presented for performing matrix sketching by employing an analog crossbar architecture. The method includes low rank updating a first matrix for a first period of time, copying the first matrix into a dynamic correction computing device, switching to a second matrix to low rank update the second matrix for a second period of time, as the second matrix is low rank updated, feeding the first matrix with first stochastic pulses to reset the first matrix back to a first matrix symmetry point, copying the second matrix into the dynamic correction computing device, switching back to the first matrix to low rank update the first matrix for a third period of time, and as the first matrix is low rank updated, feeding the second matrix with second stochastic pulses to reset the second matrix back to a second matrix symmetry point.Type: GrantFiled: May 15, 2020Date of Patent: December 6, 2022Assignees: INTERNATIONAL BUSINESS MACHINES CORPORTATION, RAMOT AT TEL-AVIV UNIVERSITY, LTD.Inventors: Lior Horesh, Oguzhan Murat Onen, Haim Avron, Tayfun Gokmen, Vasileios Kalantzis, Shashanka Ubaru
-
Patent number: 11520854Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix.Type: GrantFiled: October 29, 2019Date of Patent: December 6, 2022Assignee: Meta Platforms, Inc.Inventors: Yuchen Hao, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh, Rakesh Komuravelli, Abdulkadir Utku Diril, Thomas Mark Ulrich
-
Patent number: 11442709Abstract: A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.Type: GrantFiled: August 3, 2020Date of Patent: September 13, 2022Assignee: Texas Instmments IncorporatedInventors: Kai Chirca, Timothy D. Anderson, Todd T. Hahn, Alan L. Davis
-
Patent number: 11435941Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.Type: GrantFiled: June 24, 2020Date of Patent: September 6, 2022Assignee: Amazon Technologies, Inc.Inventors: Kun Xu, Paul Gilbert Meyer, Ron Diamant
-
Patent number: 11410070Abstract: A quantum computing device comprises at least one quantum register including a plurality of logical qubits. A compression engine is coupled to each logical qubit of the plurality of logical qubits. Each compression engine is configured to compress syndrome data. A decompression engine is coupled to each compression engine. Each decompression engine is configured to receive compressed syndrome data, decompress the received compressed syndrome data, and route the decompressed syndrome data to a decoder block.Type: GrantFiled: November 18, 2019Date of Patent: August 9, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Poulami Das, Nicolas Guillaume Delfosse, Christopher Anand Pattison, Srilatha Manne, Douglas Carmean, Krysta Marie Svore, Helmut Gottfried Katzgraber
-
Patent number: 11409840Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.Type: GrantFiled: September 25, 2020Date of Patent: August 9, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari
-
Patent number: 11392849Abstract: Systems and methods that facilitate motion formalism utilizing quantum computing, to compute matrix operators in terms of commutators between qubit operators and measurements on the quantum hardware, wherein the commutators are computed utilizing symbolic calculus. Embodiments reduce computational cost of generalized eigenvalue synthesis relying on symbolic calculus and parallelization. Embodiments disclosed herein can also develop estimators of excited-states properties, considering constants of motion (e.g. spin) and non-constants of motions (e.g. dipoles, density matrices).Type: GrantFiled: September 18, 2020Date of Patent: July 19, 2022Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, JSR CORPORATIONInventors: Mario Motta, Pauline Ollitrault, Stephen Wood, Panagiotis Barkoutsos, Joseph Latone, Ivano Tavernelli, Gavin Jones, Edward Pyzer-Knapp, Yuya Onishi
-
Patent number: 11392379Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate a signed fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.Type: GrantFiled: September 27, 2017Date of Patent: July 19, 2022Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
-
Patent number: 11379185Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes a plurality of unit circuits. Each of the unit circuits includes a multiplying-adding circuit, a first register, and a second register. A first input terminal and a second input terminal of the multiplying-adding circuit are respectively coupled to a corresponding first input line and a corresponding second input line. An input terminal and an output terminal of the first register are respectively coupled to an output terminal and a third input terminal of the multiplying-adding circuit. The second register is coupled to the first register to receive and temporarily store a multiplication accumulation result. Wherein, the second registers of the unit circuits output the multiplication accumulation results in a column direction in a first output mode, and output the multiplication accumulation results in a row direction in a second output mode.Type: GrantFiled: September 21, 2020Date of Patent: July 5, 2022Assignee: NEUCHIPS CORPORATIONInventors: Jian-Wen Chen, Chiung-Liang Lin
-
Patent number: 11334355Abstract: Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.Type: GrantFiled: May 4, 2017Date of Patent: May 17, 2022Assignee: Futurewei Technologies, Inc.Inventors: Alan Gatherer, Sushma Wokhlu, Peter Yan, Ywhpyng Harn, Ashish Rai Shrivastava, Tong Sun, Lee Dobson McFearin
-
Patent number: 11294986Abstract: Techniques regarding an iterative energy-scaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a read-out component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.Type: GrantFiled: November 22, 2019Date of Patent: April 5, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia
-
Patent number: 11269973Abstract: Repeating patterns are identified in a matrix. Based on the identification of the repeating patterns, instructions are generated, which are executable by processing cores of a dot product engine to allocate analog multiplication crossbars of the dot product engine to perform multiplication of the matrix with a vector.Type: GrantFiled: April 28, 2020Date of Patent: March 8, 2022Assignee: Hewlett Packard Enterprise Development LPInventors: Mashood Abdulla Kodavanji, Soumitra Chatterjee, Chinmay Ghosh, Mohan Parthasarathy
-
Patent number: 11263512Abstract: A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating strictly separate control and data planes. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.Type: GrantFiled: April 3, 2018Date of Patent: March 1, 2022Inventors: Avi Baum, Or Danon, Hadar Zeitlin, Daniel Ciubotariu, Rami Feig
-
Patent number: 11263018Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.Type: GrantFiled: October 23, 2017Date of Patent: March 1, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-seok Kwon, Jae-un Park, Dong-kwan Suh, Kang-jin Yoon
-
Patent number: 11256508Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.Type: GrantFiled: May 23, 2019Date of Patent: February 22, 2022Assignee: Texas Instruments IncorporatedInventors: Asheesh Bhardwaj, William Franklin Leven, Son Hung Tran, Timothy David Anderson
-
Patent number: 11249761Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.Type: GrantFiled: July 20, 2020Date of Patent: February 15, 2022Assignee: Intel CorporationInventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
-
Patent number: 11249759Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.Type: GrantFiled: May 23, 2019Date of Patent: February 15, 2022Assignee: Texas Instruments IncorporatedInventors: William Franklin Leven, Asheesh Bhardwaj, Son Hung Tran, Timothy David Anderson
-
Patent number: 11231929Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for a selected dimension of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When the selected dimension in the stream of vectors exceeds the specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.Type: GrantFiled: May 23, 2019Date of Patent: January 25, 2022Assignee: Texas Instruments IncorporatedInventors: Son Hung Tran, Shyam Jagannathan, Timothy David Anderson
-
Patent number: 11232175Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.Type: GrantFiled: March 28, 2019Date of Patent: January 25, 2022Assignee: NEC CORPORATIONInventors: Lu Feng, Chunchen Liu, Wenjuan Wei
-
Patent number: 11216533Abstract: A grouping means 11 that extracts basis vectors from a set of basis vectors for a lattice having a predetermined relationship with a matrix used to generate a public key, and that groups the basis vectors such that a predetermined condition is satisfied. A sampling means 12 that samples, for at least one group, the same number of arbitrary values as the number of a plurality of basis vectors included in that group, in parallel for the individual basis vectors, onto a lattice constituted by the plurality of basis vectors, the arbitrary values serving as random numbers following a discrete Gaussian distribution. The predetermined condition is that each of the basis vectors included in a group is orthogonal to the other basis vectors included in the same group and is also orthogonal to Gram-Schmidt basis vectors, which are vectors obtained by orthogonalizing the other basis vectors by Gram-Schmidt orthogonalization.Type: GrantFiled: May 12, 2017Date of Patent: January 4, 2022Assignee: NEC CORPORATIONInventors: Yuki Tanaka, Kazuhiko Minematsu
-
Patent number: 11188328Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A number of rank updates of a result matrix to store in an accumulator register having a predetermined size are determined, where the number of rank updates is based on the first precision and the first shape of the first input matrix, the second precision and the second shape of the second input matrix, and the predetermined size of the accumulator register. A plurality of linear algebra operations is repeated in parallel within the compute array to update the result matrix in the accumulator register based on the first input matrix, the second input matrix, and the number of rank updates.Type: GrantFiled: December 12, 2019Date of Patent: November 30, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
-
Patent number: 11182126Abstract: Computationally efficient mixed precision floating point waveform generation takes advantage of the high-speed generation of waveforms with single-precision floating point numbers while reducing the generally unacceptable loss of precision of pure single-precision floating point to generate any waveform that repeats in 2?. This approaches computes a reference phase in double precision as the modulus of the phase with 2? and then computes offsets to that value in single precision. The double precision reference phase is recomputed as needed depending on how quickly the phase grows and how large a machine epsilon is desired.Type: GrantFiled: June 25, 2019Date of Patent: November 23, 2021Assignee: Raytheon CompanyInventors: Ender Barillas, Brian Filarsky
-
Patent number: 11182458Abstract: Embodiments of the present invention are directed to a new instruction set extension and a method for providing 3D lane predication for matrix operations. In a non-limiting embodiment of the invention, a first input matrix having m rows and k columns and a second input matrix having k rows and n columns are received by a compute array of a processor. A three-dimensional predicate mask having an M-bit row mask, an N-bit column mask, and a K-bit rank mask is generated. A result matrix of up to m rows, up to n columns, and up to k rank updates is determined based on the first input matrix, the second input matrix, and the predicate mask.Type: GrantFiled: December 12, 2019Date of Patent: November 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brett Olsson, Brian W. Thompto, Jose E. Moreira, Silvia Melitta Mueller, Andreas Wagner
-
Patent number: 11169800Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.Type: GrantFiled: October 18, 2019Date of Patent: November 9, 2021Assignee: Intel CorporationInventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov