Matrix Array Patents (Class 708/520)
  • Patent number: 11687616
    Abstract: An arithmetic processing apparatus includes a memory and a processor. The processor coupled to memory and configured to determine an individual not to be evolved to an individual of a second generation from among a plurality of individuals in a first generation based on a predetermined reference for calculation completion of fitness calculation for each of the plurality of individuals, the second generation being a generation next to the first generation, and determine to cause the determined individual to evolve to an individual of a generation next or subsequent to the second generation.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: June 27, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Yukito Tsunoda, Teruo Ishihara
  • Patent number: 11609762
    Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
    Type: Grant
    Filed: August 10, 2021
    Date of Patent: March 21, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11556852
    Abstract: A computer-implemented method for determining a set of target items to be annotated for training a machine learning application. The method comprises providing a training data set with a set of data samples and an auto-encoder with a classifier. The auto-encoder comprises an embedding model that maps the set of data samples to a set of compressed feature vectors. The set of compressed feature vectors define a compressed feature matrix. Further provided are: a definition of a graph associated to the compressed feature matrix, applying a clustering-algorithm to identify node clusters of the graph and applying a centrality algorithm to identify central nodes of the node clusters, retrieving from an annotator node labels for the central nodes, propagating the annotated node labels to other nodes of the graph and performing a training of the embedding model and the classifier with the annotated and the propagated node labels.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: January 17, 2023
    Assignee: International Business Machines Corporation
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Ralf Kaestner, Alexander Velizhev, Dal Noguer Hidalgo, Rita Kuznetsova, Konstantinos Bekas
  • Patent number: 11550872
    Abstract: Quantum computing systems and methods are provided. In one example, a quantum computing system includes a quantum system having one or more quantum system qubits and one or more ancilla qubits. The quantum computing system includes one or more quantum gates implemented by the quantum computing system. The quantum gate(s) are operable to configure the one or more ancilla qubits into a known state. The quantum computing system includes a quantum measurement circuit operable to perform a plurality of measurements on the one or more quantum system qubits using the one or more ancilla qubits. The quantum computing system includes one or more processors operable to determine a reduced density matrix for a subset of the quantum system based on a set of the plurality of measurements that include a number of repeated measurements performed using the quantum measurement circuit.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: January 10, 2023
    Assignee: GOOGLE LLC
    Inventor: Zhang Jiang
  • Patent number: 11520854
    Abstract: A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: December 6, 2022
    Assignee: Meta Platforms, Inc.
    Inventors: Yuchen Hao, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh, Rakesh Komuravelli, Abdulkadir Utku Diril, Thomas Mark Ulrich
  • Patent number: 11520855
    Abstract: A computer-implemented method is presented for performing matrix sketching by employing an analog crossbar architecture. The method includes low rank updating a first matrix for a first period of time, copying the first matrix into a dynamic correction computing device, switching to a second matrix to low rank update the second matrix for a second period of time, as the second matrix is low rank updated, feeding the first matrix with first stochastic pulses to reset the first matrix back to a first matrix symmetry point, copying the second matrix into the dynamic correction computing device, switching back to the first matrix to low rank update the first matrix for a third period of time, and as the first matrix is low rank updated, feeding the second matrix with second stochastic pulses to reset the second matrix back to a second matrix symmetry point.
    Type: Grant
    Filed: May 15, 2020
    Date of Patent: December 6, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORTATION, RAMOT AT TEL-AVIV UNIVERSITY, LTD.
    Inventors: Lior Horesh, Oguzhan Murat Onen, Haim Avron, Tayfun Gokmen, Vasileios Kalantzis, Shashanka Ubaru
  • Patent number: 11442709
    Abstract: A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: September 13, 2022
    Assignee: Texas Instmments Incorporated
    Inventors: Kai Chirca, Timothy D. Anderson, Todd T. Hahn, Alan L. Davis
  • Patent number: 11435941
    Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.
    Type: Grant
    Filed: June 24, 2020
    Date of Patent: September 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Paul Gilbert Meyer, Ron Diamant
  • Patent number: 11409840
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: August 9, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari
  • Patent number: 11410070
    Abstract: A quantum computing device comprises at least one quantum register including a plurality of logical qubits. A compression engine is coupled to each logical qubit of the plurality of logical qubits. Each compression engine is configured to compress syndrome data. A decompression engine is coupled to each compression engine. Each decompression engine is configured to receive compressed syndrome data, decompress the received compressed syndrome data, and route the decompressed syndrome data to a decoder block.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: August 9, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Poulami Das, Nicolas Guillaume Delfosse, Christopher Anand Pattison, Srilatha Manne, Douglas Carmean, Krysta Marie Svore, Helmut Gottfried Katzgraber
  • Patent number: 11392379
    Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate a signed fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: July 19, 2022
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
  • Patent number: 11392849
    Abstract: Systems and methods that facilitate motion formalism utilizing quantum computing, to compute matrix operators in terms of commutators between qubit operators and measurements on the quantum hardware, wherein the commutators are computed utilizing symbolic calculus. Embodiments reduce computational cost of generalized eigenvalue synthesis relying on symbolic calculus and parallelization. Embodiments disclosed herein can also develop estimators of excited-states properties, considering constants of motion (e.g. spin) and non-constants of motions (e.g. dipoles, density matrices).
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: July 19, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, JSR CORPORATION
    Inventors: Mario Motta, Pauline Ollitrault, Stephen Wood, Panagiotis Barkoutsos, Joseph Latone, Ivano Tavernelli, Gavin Jones, Edward Pyzer-Knapp, Yuya Onishi
  • Patent number: 11379185
    Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes a plurality of unit circuits. Each of the unit circuits includes a multiplying-adding circuit, a first register, and a second register. A first input terminal and a second input terminal of the multiplying-adding circuit are respectively coupled to a corresponding first input line and a corresponding second input line. An input terminal and an output terminal of the first register are respectively coupled to an output terminal and a third input terminal of the multiplying-adding circuit. The second register is coupled to the first register to receive and temporarily store a multiplication accumulation result. Wherein, the second registers of the unit circuits output the multiplication accumulation results in a column direction in a first output mode, and output the multiplication accumulation results in a row direction in a second output mode.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: July 5, 2022
    Assignee: NEUCHIPS CORPORATION
    Inventors: Jian-Wen Chen, Chiung-Liang Lin
  • Patent number: 11334355
    Abstract: Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: May 17, 2022
    Assignee: Futurewei Technologies, Inc.
    Inventors: Alan Gatherer, Sushma Wokhlu, Peter Yan, Ywhpyng Harn, Ashish Rai Shrivastava, Tong Sun, Lee Dobson McFearin
  • Patent number: 11294986
    Abstract: Techniques regarding an iterative energy-scaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a read-out component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: April 5, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia
  • Patent number: 11269973
    Abstract: Repeating patterns are identified in a matrix. Based on the identification of the repeating patterns, instructions are generated, which are executable by processing cores of a dot product engine to allocate analog multiplication crossbars of the dot product engine to perform multiplication of the matrix with a vector.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: March 8, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Mashood Abdulla Kodavanji, Soumitra Chatterjee, Chinmay Ghosh, Mohan Parthasarathy
  • Patent number: 11263018
    Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.
    Type: Grant
    Filed: October 23, 2017
    Date of Patent: March 1, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-seok Kwon, Jae-un Park, Dong-kwan Suh, Kang-jin Yoon
  • Patent number: 11263512
    Abstract: A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating strictly separate control and data planes. The NN processor is constructed from self-contained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. On-chip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: March 1, 2022
    Inventors: Avi Baum, Or Danon, Hadar Zeitlin, Daniel Ciubotariu, Rami Feig
  • Patent number: 11256508
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: February 22, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Asheesh Bhardwaj, William Franklin Leven, Son Hung Tran, Timothy David Anderson
  • Patent number: 11249761
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 11249759
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: February 15, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: William Franklin Leven, Asheesh Bhardwaj, Son Hung Tran, Timothy David Anderson
  • Patent number: 11232175
    Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: January 25, 2022
    Assignee: NEC CORPORATION
    Inventors: Lu Feng, Chunchen Liu, Wenjuan Wei
  • Patent number: 11231929
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for a selected dimension of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When the selected dimension in the stream of vectors exceeds the specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: January 25, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Son Hung Tran, Shyam Jagannathan, Timothy David Anderson
  • Patent number: 11216533
    Abstract: A grouping means 11 that extracts basis vectors from a set of basis vectors for a lattice having a predetermined relationship with a matrix used to generate a public key, and that groups the basis vectors such that a predetermined condition is satisfied. A sampling means 12 that samples, for at least one group, the same number of arbitrary values as the number of a plurality of basis vectors included in that group, in parallel for the individual basis vectors, onto a lattice constituted by the plurality of basis vectors, the arbitrary values serving as random numbers following a discrete Gaussian distribution. The predetermined condition is that each of the basis vectors included in a group is orthogonal to the other basis vectors included in the same group and is also orthogonal to Gram-Schmidt basis vectors, which are vectors obtained by orthogonalizing the other basis vectors by Gram-Schmidt orthogonalization.
    Type: Grant
    Filed: May 12, 2017
    Date of Patent: January 4, 2022
    Assignee: NEC CORPORATION
    Inventors: Yuki Tanaka, Kazuhiko Minematsu
  • Patent number: 11188328
    Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A number of rank updates of a result matrix to store in an accumulator register having a predetermined size are determined, where the number of rank updates is based on the first precision and the first shape of the first input matrix, the second precision and the second shape of the second input matrix, and the predetermined size of the accumulator register. A plurality of linear algebra operations is repeated in parallel within the compute array to update the result matrix in the accumulator register based on the first input matrix, the second input matrix, and the number of rank updates.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: November 30, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
  • Patent number: 11182458
    Abstract: Embodiments of the present invention are directed to a new instruction set extension and a method for providing 3D lane predication for matrix operations. In a non-limiting embodiment of the invention, a first input matrix having m rows and k columns and a second input matrix having k rows and n columns are received by a compute array of a processor. A three-dimensional predicate mask having an M-bit row mask, an N-bit column mask, and a K-bit rank mask is generated. A result matrix of up to m rows, up to n columns, and up to k rank updates is determined based on the first input matrix, the second input matrix, and the predicate mask.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: November 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brett Olsson, Brian W. Thompto, Jose E. Moreira, Silvia Melitta Mueller, Andreas Wagner
  • Patent number: 11182126
    Abstract: Computationally efficient mixed precision floating point waveform generation takes advantage of the high-speed generation of waveforms with single-precision floating point numbers while reducing the generally unacceptable loss of precision of pure single-precision floating point to generate any waveform that repeats in 2?. This approaches computes a reference phase in double precision as the modulus of the phase with 2? and then computes offsets to that value in single precision. The double precision reference phase is recomputed as needed depending on how quickly the phase grows and how large a machine epsilon is desired.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: November 23, 2021
    Assignee: Raytheon Company
    Inventors: Ender Barillas, Brian Filarsky
  • Patent number: 11169800
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
    Type: Grant
    Filed: October 18, 2019
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
  • Patent number: 11144615
    Abstract: Embodiments relate to a denominator circuit that determines the number of valid elements of a data surface covered by a kernel depending on various locations of the kernel relative to the data surface. The denominator circuit includes a first circuit and a second circuit that have the same structure. The first circuit receives numbers representing different horizontal locations of a reference point in the kernel and generates a first matrix with first output elements corresponding to the different horizontal locations. The second circuit receives numbers representing different vertical locations of a reference point in the kernel and generates a second matrix with second output elements corresponding to the different vertical locations. A matrix multiplication of the first matrix and the second matrix is performed to obtain an array of valid elements covered by the kernel.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: October 12, 2021
    Assignee: APPLE INC.
    Inventors: Yiu Chun Tse, Ji Liang Song, Ponan Kuo
  • Patent number: 11113028
    Abstract: An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: September 7, 2021
    Assignee: Arm Limited
    Inventors: Xiaoyang Shen, David Raymond Lutz, Cédric Denis Robert Airaud
  • Patent number: 11100192
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: August 24, 2021
    Assignee: Cambricon Technologies Corporation Limited
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 11099844
    Abstract: Performing n-dimensional stencil processing may include providing a memory unit organized in memory banks for storing elements of an nD matrix, processing the matrix using a stencil vector unit in a first processing direction of the matrix tile-wise(/d). Data elements of the matrix can be equally distributed over the memory banks, and the number of memory banks can be equal to the number of data elements processable by the stencil vector unit in parallel, which is equal to the number of data elements in a width direction of one of the tiles. Additionally, the boundary elements can be grouped in the width direction of the tiles into a nD sub-matrix, and the nD sub-matrix can be processed equally to the processing the nD matrix orthogonal to the first processing direction.
    Type: Grant
    Filed: May 16, 2019
    Date of Patent: August 24, 2021
    Assignee: International Business Machines Corporation
    Inventor: Jan Van Lunteren
  • Patent number: 11093582
    Abstract: A method for calculating axis deviation of rotor assembly based on end face runout measurement comprises three parts: calculation of three contact points, a triangle judgment criterion and a homogeneous coordinate transformation algorithm of a deviation matrix. Based on the measured end face runout data in production practice, the method realizes the prediction of axis deviation before assembly, improves the concentricity of rotors after assembly, also greatly increases the one-time acceptance rate of assembly and has important practical guiding significance for axis prediction as well as assembly phase adjustment and optimization in the assembly process of aero-engine rotor pieces.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: August 17, 2021
    Assignee: Dalian University of Technology
    Inventors: Qingchao Sun, Xin Liu, Yichao Gao, Yunlong Wang
  • Patent number: 11093243
    Abstract: Vector interleaving techniques in a data processing apparatus are disclosed, comprising apparatuses, instructions, methods of operating the apparatuses, and simulator implementations. A vector interleaving instruction specifies a first source register, second source register, and destination register. A first set of input data items is retrieved from the first source register and a second set of input data items from the second source register. A data processing operation is performed on selected input data item pairs taken from the first and second set of input data items to generate a set of result data items, which are stored as a result data vector in the destination register. First source register dependent result data items are stored in a first set of alternating positions in the destination data vector and second source register dependent result data items are stored in a second set of alternating positions in the destination data vector.
    Type: Grant
    Filed: July 2, 2018
    Date of Patent: August 17, 2021
    Assignee: ARM Limited
    Inventors: Mbou Eyole, Nigel John Stephens
  • Patent number: 11086625
    Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: August 10, 2021
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Ali Sazegari
  • Patent number: 11030095
    Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: June 8, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Swapnil Sakharshete, Samuel Lawrence Wasmundt
  • Patent number: 11023242
    Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: June 1, 2021
    Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.
    Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani
  • Patent number: 11017290
    Abstract: A signal processing module comprises at least one operational unit incorporating computation units, input and output interfaces able to be linked to a bus and a memory storing data destined for the computation units, the memory being organized so that each data word is stored column-wise over several addresses according to an order dependent on the application, a column having a width of one bit, the words being transferred in series to the computation units.
    Type: Grant
    Filed: November 27, 2014
    Date of Patent: May 25, 2021
    Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventors: Marc Duranton, Jean-Marc Philippe
  • Patent number: 10996944
    Abstract: A processing device can establish a machine learning model to produce software dependency recommendations. The model can be periodically retrained to update its knowledge of available dependencies. The software dependencies can be incorporated into software by developers who receive the selection or automatically by an intelligent software development platform. A processing device can train the model by assembling sparse user data based on feedback corresponding to software dependencies to produce a vector of preferences for each user. The processing device can also generate a latent vector of attributes for each software dependency. The processing device can then apply matrix factorization to the vectors to produce a behavior matrix that is used to train the machine learning model.
    Type: Grant
    Filed: August 6, 2019
    Date of Patent: May 4, 2021
    Assignee: Red Hat, Inc.
    Inventors: Avishkar Gupta, Aagam Shah, Sarah Masud
  • Patent number: 10986014
    Abstract: A monitoring system detects a deviation in a monitoring metric of a system component of a remote management system that remotely manages image forming apparatuses. When the monitoring system detects a deviation in online device count greater than or equal to a deviation threshold and makes a determination that there is a correlation between the deviations in monitoring metrics of multiple system components as detected, the monitoring system sends a failure report indicating that a failure is in the remote management system.
    Type: Grant
    Filed: June 5, 2020
    Date of Patent: April 20, 2021
    Assignee: KYOCERA DOCUMENT SOLUTIONS INC.
    Inventors: Dukil Park, Kazuki Nishikai, Koki Nakajima, Yasuo Nakashima, Satoshi Goshima, Yuichi Obayashi, Takeshi Nakamura
  • Patent number: 10915318
    Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
    Type: Grant
    Filed: March 4, 2019
    Date of Patent: February 9, 2021
    Assignee: Google LLC
    Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
  • Patent number: 10896039
    Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: January 19, 2021
    Assignee: Intel Corporation
    Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
  • Patent number: 10897605
    Abstract: Apparatuses, systems, and methods related to an image processor formed in an array of memory cells are described. An image processor as described herein is configured to reduce complexity and power consumption and/or increase data access bandwidth by performing image processing in the array of memory cells relative to image processing by a host processor external to the memory array. For instance, one apparatus described herein includes sensor circuitry configured to provide an input vector, as a plurality of bits that corresponds to a plurality of color components for an image pixel, and an image processor formed in an array of memory cells. The image processor is coupled to the sensor circuitry to receive the plurality of bits of the input vector. The image processor is configured to perform a color correction operation in the array by performing matrix multiplication on the input vector and a parameter matrix to determine an output vector that is color corrected.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: January 19, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Fa-Long Luo, Jaime C. Cummins, Tamara Schmitz
  • Patent number: 10872130
    Abstract: Based on a Modified Gram-Schmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmetic-logic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floating-point (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.
    Type: Grant
    Filed: August 31, 2017
    Date of Patent: December 22, 2020
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Patent number: 10832799
    Abstract: Methods, systems and apparatus for detecting patterns in constituents of at least one biological organism are disclosed. In accordance with one method, clusters of the constituents are determined (208) by selecting (210) different subsets of at least one of genes or proteins and identifying (212) the clusters from biological data corresponding to the selected subsets. Here, membership values for the constituents, indicating membership within the clusters, are calculated for use as a basis of an additional cluster determination process (208) to obtain final clusters of constituents. By underpinning the preliminary clustering on different subsets of biological data and formulating the higher-level clustering on the basis of the membership values, the embodiments can enable an evaluation of a large variety of biological data in a practical, accurate and highly efficient manner.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: November 10, 2020
    Assignee: Koninklijke Philips N.V.
    Inventors: Konstantin Volyanskyy, Nevenka Dimitrova
  • Patent number: 10762163
    Abstract: In embodiments of probabilistic matrix factorization for automated machine learning, a computing system memory maintains different workflows that each include preprocessing steps for a machine learning model, the machine learning model, and one or more parameters for the machine learning model. The computing system memory additionally maintains different data sets, upon which the different workflows can be trained and tested. A matrix is generated from the different workflows and different data sets, where cells of the matrix are populated with performance metrics that each indicate a measure of performance for a workflow applied to a data set. A low-rank decomposition of the matrix with populated performance metrics is then determined. Based on the low-rank decomposition, an optimum workflow for a new data set can be determined. The optimum workflow can be one of the different workflows or a hybrid of at least two of the different workflows.
    Type: Grant
    Filed: December 5, 2016
    Date of Patent: September 1, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Nicolo Fusi
  • Patent number: 10762164
    Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: September 1, 2020
    Assignee: Cambricon Technologies Corporation Limited
    Inventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
  • Patent number: 10755426
    Abstract: An electronic device comprises circuitry implementing a depth map enhancer. The depth map enhancer obtains an initial depth map corresponding to a scene and an image of the scene. The depth map enhancer generates a refined depth map corresponding to the scene using an optimizer, the initial depth map and the image. The refined depth map includes estimated depth indicators corresponding to at least a first depth-information region, identified based at least in part on a first criterion, of the initial depth map. Input based on the refined depth map is provided to an image processing application.
    Type: Grant
    Filed: May 23, 2018
    Date of Patent: August 25, 2020
    Assignee: Apple Inc.
    Inventors: Mark Norman Lester Jouppi, Michael Wish Tao, Eric Bujold, Stephane Simon Rene Ben Soussan, Volker Roelke, Geoffrey T. Anneheim, Julio Cesar Hernandez Zaragoza, Florian Ciurea
  • Patent number: 10747846
    Abstract: Matrix processing includes: initializing a current matrix based at least in part on an original matrix; iteratively determining a matrix property using a plurality of iteration cycles, including, in an iteration cycle: partitioning the current matrix to obtain a plurality of partitions, wherein the plurality of partitions includes a submatrix; modifying the submatrix based at least in part on other partitions of the plurality of partitions to provide a current matrix for a next iteration; and continuing to iterate until a condition is met. Matrix processing further includes obtaining the matrix property from an iteration result; and outputting the matrix property.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 18, 2020
    Assignee: Cyber Atomics, Inc.
    Inventor: Roy Batruni
  • Patent number: 10743026
    Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: August 11, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yoon-mi Hong, Woo-jin Han, Min-su Cheon, Jianle Chen