Matrix Array Patents (Class 708/520)
  • Patent number: 12001484
    Abstract: Methods and systems for low-latency multi-constraint ranking of content items. One of the methods includes receiving a request to rank a plurality of content items for presentation to a user to maximize a primary objective subject to a plurality of constraints; initializing a dual variable vector; updating the dual variable vector, comprising: determining an overall objective score for the dual variable vector; identifying a plurality of candidate dual variable vectors that includes one or more neighboring node dual variable vectors; determining respective overall objective scores for each of the one or more candidate dual variable vectors; identifying the candidate with the best overall objective score; and determining whether to update the dual variable vector based on whether the identified candidate has a better overall objective score than the dual variable vector; and determining a final ranking for the content items based on the dual variable vector.
    Type: Grant
    Filed: February 16, 2021
    Date of Patent: June 4, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Timothy Arthur Mann, Ivan Lobov, Anton Zhernov, Krishnamurthy Dvijotham, Xiaohong Gong, Dan-Andrei Calian
  • Patent number: 11995149
    Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: May 28, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor
  • Patent number: 11921814
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: March 5, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
  • Patent number: 11914670
    Abstract: Methods and systems for compressing a matrix are described. The matrix, having a plurality of rows formed by a respective plurality of vectors, is partitioned into a plurality of submatrices, each submatrix containing sub-vectors from a respective group of one or more contiguous columns of the matrix. For each given submatrix, the sub-vectors are clustered into a plurality of clusters. For each given cluster, a centroid and a variance are computed and stored, based on the sub-vectors belonging to the given cluster. A mapping relating each vector to a respective cluster in each submatrix is stored. The stored centroids, stored variances and stored mapping form a set of compressed data for reconstruction of the matrix.
    Type: Grant
    Filed: September 8, 2020
    Date of Patent: February 27, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Krtin Kumar, Mehdi Rezagholizadeh, Peyman Passban
  • Patent number: 11907719
    Abstract: The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Jason R. Bergendahl
  • Patent number: 11907713
    Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
  • Patent number: 11899744
    Abstract: A neural network apparatus for performing a matrix multiplication operation includes a memory having at least one program stored therein and a processor to perform one or more operations by executing the at least one program. The processor can determine whether to divide an initial weight in one of a column direction and a row direction according to whether a reshape operation and a transpose operation are performed before or after a matrix multiplication operation and generate division weights by dividing the initial weight by a head count in the determined direction. Also, the processor can generate intermediate feature maps by performing a matrix multiplication operation between the input feature map and the division weights and generate a final feature map based on the intermediate feature maps.
    Type: Grant
    Filed: April 17, 2020
    Date of Patent: February 13, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Songyi Han, Hyunsun Park
  • Patent number: 11893079
    Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: February 6, 2024
    Assignee: NEC CORPORATION
    Inventors: Lu Feng, Chunchen Liu, Wenjuan Wei
  • Patent number: 11853387
    Abstract: A data sparse projection method, includes: randomly initializing a high-dimensional sparse two-dimensional matrix (S1); fixing the high-dimensional sparse two-dimensional matrix, and calculating an optimal output variable by using the high-dimensional sparse two-dimensional matrix (S2); fixing the optimal output variable, and calculating an optimal high-dimensional sparse two-dimensional matrix by using the optimal output variable (S3); and cyclically fixing the high-dimensional sparse two-dimensional matrix and the output variable until the optimal output variable is no longer increased when the high-dimensional sparse two-dimensional matrix is fixed (S4).
    Type: Grant
    Filed: April 12, 2023
    Date of Patent: December 26, 2023
    Assignee: THE CHINESE UNIVERSITY OF HONG KONG, SHENZHEN
    Inventors: Chonglin Gu, Changyi Ma, Wenye Li, Shuguang Cui
  • Patent number: 11836371
    Abstract: A storage system memory or memory domain with N memory controllers is organized into N-1 same-size partitions per memory controller or N partitions per memory controller with one partition reserved as spare capacity. The unreserved partitions are assigned to mirror pairs of members such that a first triangular submatrix of a representative matrix of indexed memory controllers and indexed partitions is a transpose of a second triangular submatrix of the representative matrix. The resulting distribution of members is balanced such that additional loading on remaining memory controllers when one of the memory controllers becomes inaccessible is evenly distributed.
    Type: Grant
    Filed: July 8, 2022
    Date of Patent: December 5, 2023
    Assignee: Dell Products L.P.
    Inventors: Kuolin Hua, Adnan Sahin
  • Patent number: 11836751
    Abstract: A method for measuring relatedness between prediction tasks includes receiving data for a first prediction task. The method further includes measuring the relatedness of the first prediction task to at least one previous prediction task as a difference between divergence of conditional probabilities of the tasks. The method can be advantageously applied in artificial intelligence or continual learning systems.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: December 5, 2023
    Assignee: NEC CORPORATION
    Inventors: Shujian Yu, Ammar Shaker
  • Patent number: 11823303
    Abstract: A data processing method and apparatus are disclosed. In various embodiments, R groups of proposal region sequences are obtained. Each group of proposal region sequence includes a plurality of proposal regions. In those embodiments, a VRPAC instruction is invoked to calculate an area of each proposal region in each group of proposal region sequence. For a jth group of proposal region sequence in the R groups of proposal region sequences, a VIOU instruction and a VAADD instruction are invoked to determine j suppression matrices of the jth group of proposal region sequence and determine a suppression vector of the jth group of proposal region sequence based on the j suppression matrices. In those embodiments, an unsuppressed proposal region is determined based on a suppression vector of each group of proposal region sequence.
    Type: Grant
    Filed: July 19, 2020
    Date of Patent: November 21, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Luping Cui, Jiajin Tu, Hu Liu, Honghui Yuan, Heng Liao, Hou Fun Lam, Bing Li
  • Patent number: 11797643
    Abstract: Embodiments of apparatus and method for matrix multiplication using processing-in-memory (PIM) are disclosed. In an example, an apparatus for matrix multiplication includes an array of tiles that each include one or more PIM blocks. A PIM block may include a hybrid-mode PIM block that may be configured into a digital mode or an analog mode. The PIM block configured into digital mode may perform operations associated with depth-wise (DW) convolution. On the other hand, a PIM block configured into analog mode may perform operations associated with point-wise (PW) convolution. A controller may be used to configure the PIM block into either digital mode or analog mode, depending on the computations.
    Type: Grant
    Filed: November 9, 2020
    Date of Patent: October 24, 2023
    Assignee: NEONEXUS PTE. LTD.
    Inventor: Qilin Zheng
  • Patent number: 11734386
    Abstract: A matrix processing method performed by a graphics processing unit (GPU) includes: determining a plurality of non-zero elements in a to-be-processed matrix at a processor in the GPU; generating a distribution matrix of the to-be-processed matrix at the processor, where the distribution matrix comprises identities for indicating positions of the plurality of non-zero elements in the to-be-processed matrix; obtaining a target matrix from another matrix by using the distribution matrix at a logic circuit in the processor, where the target matrix comprises a plurality of target elements from the another matrix; and performing matrix processing on the plurality of non-zero elements and the target matrix to obtain an operation result at the processor.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: August 22, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Zhenjiang Dong, Chio In Ieong, Hu Liu, Hai Chen
  • Patent number: 11734387
    Abstract: Techniques regarding an iterative energy-scaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a read-out component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: August 22, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia
  • Patent number: 11734383
    Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: August 22, 2023
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
  • Patent number: 11687616
    Abstract: An arithmetic processing apparatus includes a memory and a processor. The processor coupled to memory and configured to determine an individual not to be evolved to an individual of a second generation from among a plurality of individuals in a first generation based on a predetermined reference for calculation completion of fitness calculation for each of the plurality of individuals, the second generation being a generation next to the first generation, and determine to cause the determined individual to evolve to an individual of a generation next or subsequent to the second generation.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: June 27, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Yukito Tsunoda, Teruo Ishihara
  • Patent number: 11609762
    Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
    Type: Grant
    Filed: August 10, 2021
    Date of Patent: March 21, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11556852
    Abstract: A computer-implemented method for determining a set of target items to be annotated for training a machine learning application. The method comprises providing a training data set with a set of data samples and an auto-encoder with a classifier. The auto-encoder comprises an embedding model that maps the set of data samples to a set of compressed feature vectors. The set of compressed feature vectors define a compressed feature matrix. Further provided are: a definition of a graph associated to the compressed feature matrix, applying a clustering-algorithm to identify node clusters of the graph and applying a centrality algorithm to identify central nodes of the node clusters, retrieving from an annotator node labels for the central nodes, propagating the annotated node labels to other nodes of the graph and performing a training of the embedding model and the classifier with the annotated and the propagated node labels.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: January 17, 2023
    Assignee: International Business Machines Corporation
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Ralf Kaestner, Alexander Velizhev, Dal Noguer Hidalgo, Rita Kuznetsova, Konstantinos Bekas
  • Patent number: 11550872
    Abstract: Quantum computing systems and methods are provided. In one example, a quantum computing system includes a quantum system having one or more quantum system qubits and one or more ancilla qubits. The quantum computing system includes one or more quantum gates implemented by the quantum computing system. The quantum gate(s) are operable to configure the one or more ancilla qubits into a known state. The quantum computing system includes a quantum measurement circuit operable to perform a plurality of measurements on the one or more quantum system qubits using the one or more ancilla qubits. The quantum computing system includes one or more processors operable to determine a reduced density matrix for a subset of the quantum system based on a set of the plurality of measurements that include a number of repeated measurements performed using the quantum measurement circuit.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: January 10, 2023
    Assignee: GOOGLE LLC
    Inventor: Zhang Jiang
  • Patent number: 11520854
    Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: December 6, 2022
    Assignee: Meta Platforms, Inc.
    Inventors: Yuchen Hao, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh, Rakesh Komuravelli, Abdulkadir Utku Diril, Thomas Mark Ulrich
  • Patent number: 11520855
    Abstract: A computer-implemented method is presented for performing matrix sketching by employing an analog crossbar architecture. The method includes low rank updating a first matrix for a first period of time, copying the first matrix into a dynamic correction computing device, switching to a second matrix to low rank update the second matrix for a second period of time, as the second matrix is low rank updated, feeding the first matrix with first stochastic pulses to reset the first matrix back to a first matrix symmetry point, copying the second matrix into the dynamic correction computing device, switching back to the first matrix to low rank update the first matrix for a third period of time, and as the first matrix is low rank updated, feeding the second matrix with second stochastic pulses to reset the second matrix back to a second matrix symmetry point.
    Type: Grant
    Filed: May 15, 2020
    Date of Patent: December 6, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORTATION, RAMOT AT TEL-AVIV UNIVERSITY, LTD.
    Inventors: Lior Horesh, Oguzhan Murat Onen, Haim Avron, Tayfun Gokmen, Vasileios Kalantzis, Shashanka Ubaru
  • Patent number: 11442709
    Abstract: A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: September 13, 2022
    Assignee: Texas Instmments Incorporated
    Inventors: Kai Chirca, Timothy D. Anderson, Todd T. Hahn, Alan L. Davis
  • Patent number: 11435941
    Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.
    Type: Grant
    Filed: June 24, 2020
    Date of Patent: September 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Paul Gilbert Meyer, Ron Diamant
  • Patent number: 11410070
    Abstract: A quantum computing device comprises at least one quantum register including a plurality of logical qubits. A compression engine is coupled to each logical qubit of the plurality of logical qubits. Each compression engine is configured to compress syndrome data. A decompression engine is coupled to each compression engine. Each decompression engine is configured to receive compressed syndrome data, decompress the received compressed syndrome data, and route the decompressed syndrome data to a decoder block.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: August 9, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Poulami Das, Nicolas Guillaume Delfosse, Christopher Anand Pattison, Srilatha Manne, Douglas Carmean, Krysta Marie Svore, Helmut Gottfried Katzgraber
  • Patent number: 11409840
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: August 9, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari
  • Patent number: 11392849
    Abstract: Systems and methods that facilitate motion formalism utilizing quantum computing, to compute matrix operators in terms of commutators between qubit operators and measurements on the quantum hardware, wherein the commutators are computed utilizing symbolic calculus. Embodiments reduce computational cost of generalized eigenvalue synthesis relying on symbolic calculus and parallelization. Embodiments disclosed herein can also develop estimators of excited-states properties, considering constants of motion (e.g. spin) and non-constants of motions (e.g. dipoles, density matrices).
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: July 19, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, JSR CORPORATION
    Inventors: Mario Motta, Pauline Ollitrault, Stephen Wood, Panagiotis Barkoutsos, Joseph Latone, Ivano Tavernelli, Gavin Jones, Edward Pyzer-Knapp, Yuya Onishi
  • Patent number: 11392379
    Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate a signed fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: July 19, 2022
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
  • Patent number: 11379185
    Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes a plurality of unit circuits. Each of the unit circuits includes a multiplying-adding circuit, a first register, and a second register. A first input terminal and a second input terminal of the multiplying-adding circuit are respectively coupled to a corresponding first input line and a corresponding second input line. An input terminal and an output terminal of the first register are respectively coupled to an output terminal and a third input terminal of the multiplying-adding circuit. The second register is coupled to the first register to receive and temporarily store a multiplication accumulation result. Wherein, the second registers of the unit circuits output the multiplication accumulation results in a column direction in a first output mode, and output the multiplication accumulation results in a row direction in a second output mode.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: July 5, 2022
    Assignee: NEUCHIPS CORPORATION
    Inventors: Jian-Wen Chen, Chiung-Liang Lin
  • Patent number: 11334355
    Abstract: Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: May 17, 2022
    Assignee: Futurewei Technologies, Inc.
    Inventors: Alan Gatherer, Sushma Wokhlu, Peter Yan, Ywhpyng Harn, Ashish Rai Shrivastava, Tong Sun, Lee Dobson McFearin
  • Patent number: 11294986
    Abstract: Techniques regarding an iterative energy-scaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a read-out component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: April 5, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia
  • Patent number: 11269973
    Abstract: Repeating patterns are identified in a matrix. Based on the identification of the repeating patterns, instructions are generated, which are executable by processing cores of a dot product engine to allocate analog multiplication crossbars of the dot product engine to perform multiplication of the matrix with a vector.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: March 8, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Mashood Abdulla Kodavanji, Soumitra Chatterjee, Chinmay Ghosh, Mohan Parthasarathy
  • Patent number: 11263512
    Abstract: A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating strictly separate control and data planes. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: March 1, 2022
    Inventors: Avi Baum, Or Danon, Hadar Zeitlin, Daniel Ciubotariu, Rami Feig
  • Patent number: 11263018
    Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.
    Type: Grant
    Filed: October 23, 2017
    Date of Patent: March 1, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-seok Kwon, Jae-un Park, Dong-kwan Suh, Kang-jin Yoon
  • Patent number: 11256508
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: February 22, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Asheesh Bhardwaj, William Franklin Leven, Son Hung Tran, Timothy David Anderson
  • Patent number: 11249761
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 11249759
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: February 15, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: William Franklin Leven, Asheesh Bhardwaj, Son Hung Tran, Timothy David Anderson
  • Patent number: 11231929
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for a selected dimension of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When the selected dimension in the stream of vectors exceeds the specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: January 25, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Son Hung Tran, Shyam Jagannathan, Timothy David Anderson
  • Patent number: 11232175
    Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: January 25, 2022
    Assignee: NEC CORPORATION
    Inventors: Lu Feng, Chunchen Liu, Wenjuan Wei
  • Patent number: 11216533
    Abstract: A grouping means 11 that extracts basis vectors from a set of basis vectors for a lattice having a predetermined relationship with a matrix used to generate a public key, and that groups the basis vectors such that a predetermined condition is satisfied. A sampling means 12 that samples, for at least one group, the same number of arbitrary values as the number of a plurality of basis vectors included in that group, in parallel for the individual basis vectors, onto a lattice constituted by the plurality of basis vectors, the arbitrary values serving as random numbers following a discrete Gaussian distribution. The predetermined condition is that each of the basis vectors included in a group is orthogonal to the other basis vectors included in the same group and is also orthogonal to Gram-Schmidt basis vectors, which are vectors obtained by orthogonalizing the other basis vectors by Gram-Schmidt orthogonalization.
    Type: Grant
    Filed: May 12, 2017
    Date of Patent: January 4, 2022
    Assignee: NEC CORPORATION
    Inventors: Yuki Tanaka, Kazuhiko Minematsu
  • Patent number: 11188328
    Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A number of rank updates of a result matrix to store in an accumulator register having a predetermined size are determined, where the number of rank updates is based on the first precision and the first shape of the first input matrix, the second precision and the second shape of the second input matrix, and the predetermined size of the accumulator register. A plurality of linear algebra operations is repeated in parallel within the compute array to update the result matrix in the accumulator register based on the first input matrix, the second input matrix, and the number of rank updates.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: November 30, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
  • Patent number: 11182126
    Abstract: Computationally efficient mixed precision floating point waveform generation takes advantage of the high-speed generation of waveforms with single-precision floating point numbers while reducing the generally unacceptable loss of precision of pure single-precision floating point to generate any waveform that repeats in 2?. This approaches computes a reference phase in double precision as the modulus of the phase with 2? and then computes offsets to that value in single precision. The double precision reference phase is recomputed as needed depending on how quickly the phase grows and how large a machine epsilon is desired.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: November 23, 2021
    Assignee: Raytheon Company
    Inventors: Ender Barillas, Brian Filarsky
  • Patent number: 11182458
    Abstract: Embodiments of the present invention are directed to a new instruction set extension and a method for providing 3D lane predication for matrix operations. In a non-limiting embodiment of the invention, a first input matrix having m rows and k columns and a second input matrix having k rows and n columns are received by a compute array of a processor. A three-dimensional predicate mask having an M-bit row mask, an N-bit column mask, and a K-bit rank mask is generated. A result matrix of up to m rows, up to n columns, and up to k rank updates is determined based on the first input matrix, the second input matrix, and the predicate mask.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: November 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brett Olsson, Brian W. Thompto, Jose E. Moreira, Silvia Melitta Mueller, Andreas Wagner
  • Patent number: 11169800
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
    Type: Grant
    Filed: October 18, 2019
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
  • Patent number: 11144615
    Abstract: Embodiments relate to a denominator circuit that determines the number of valid elements of a data surface covered by a kernel depending on various locations of the kernel relative to the data surface. The denominator circuit includes a first circuit and a second circuit that have the same structure. The first circuit receives numbers representing different horizontal locations of a reference point in the kernel and generates a first matrix with first output elements corresponding to the different horizontal locations. The second circuit receives numbers representing different vertical locations of a reference point in the kernel and generates a second matrix with second output elements corresponding to the different vertical locations. A matrix multiplication of the first matrix and the second matrix is performed to obtain an array of valid elements covered by the kernel.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: October 12, 2021
    Assignee: APPLE INC.
    Inventors: Yiu Chun Tse, Ji Liang Song, Ponan Kuo
  • Patent number: 11113028
    Abstract: An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: September 7, 2021
    Assignee: Arm Limited
    Inventors: Xiaoyang Shen, David Raymond Lutz, C├ędric Denis Robert Airaud
  • Patent number: 11099844
    Abstract: Performing n-dimensional stencil processing may include providing a memory unit organized in memory banks for storing elements of an nD matrix, processing the matrix using a stencil vector unit in a first processing direction of the matrix tile-wise(/d). Data elements of the matrix can be equally distributed over the memory banks, and the number of memory banks can be equal to the number of data elements processable by the stencil vector unit in parallel, which is equal to the number of data elements in a width direction of one of the tiles. Additionally, the boundary elements can be grouped in the width direction of the tiles into a nD sub-matrix, and the nD sub-matrix can be processed equally to the processing the nD matrix orthogonal to the first processing direction.
    Type: Grant
    Filed: May 16, 2019
    Date of Patent: August 24, 2021
    Assignee: International Business Machines Corporation
    Inventor: Jan Van Lunteren
  • Patent number: 11100192
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: August 24, 2021
    Assignee: Cambricon Technologies Corporation Limited
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 11093582
    Abstract: A method for calculating axis deviation of rotor assembly based on end face runout measurement comprises three parts: calculation of three contact points, a triangle judgment criterion and a homogeneous coordinate transformation algorithm of a deviation matrix. Based on the measured end face runout data in production practice, the method realizes the prediction of axis deviation before assembly, improves the concentricity of rotors after assembly, also greatly increases the one-time acceptance rate of assembly and has important practical guiding significance for axis prediction as well as assembly phase adjustment and optimization in the assembly process of aero-engine rotor pieces.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: August 17, 2021
    Assignee: Dalian University of Technology
    Inventors: Qingchao Sun, Xin Liu, Yichao Gao, Yunlong Wang
  • Patent number: 11093243
    Abstract: Vector interleaving techniques in a data processing apparatus are disclosed, comprising apparatuses, instructions, methods of operating the apparatuses, and simulator implementations. A vector interleaving instruction specifies a first source register, second source register, and destination register. A first set of input data items is retrieved from the first source register and a second set of input data items from the second source register. A data processing operation is performed on selected input data item pairs taken from the first and second set of input data items to generate a set of result data items, which are stored as a result data vector in the destination register. First source register dependent result data items are stored in a first set of alternating positions in the destination data vector and second source register dependent result data items are stored in a second set of alternating positions in the destination data vector.
    Type: Grant
    Filed: July 2, 2018
    Date of Patent: August 17, 2021
    Assignee: ARM Limited
    Inventors: Mbou Eyole, Nigel John Stephens