Matrix Array Patents (Class 708/520)
  • Patent number: 11023242
    Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: June 1, 2021
    Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.
    Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani
  • Patent number: 11017290
    Abstract: A signal processing module comprises at least one operational unit incorporating computation units, input and output interfaces able to be linked to a bus and a memory storing data destined for the computation units, the memory being organized so that each data word is stored column-wise over several addresses according to an order dependent on the application, a column having a width of one bit, the words being transferred in series to the computation units.
    Type: Grant
    Filed: November 27, 2014
    Date of Patent: May 25, 2021
    Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventors: Marc Duranton, Jean-Marc Philippe
  • Patent number: 10996944
    Abstract: A processing device can establish a machine learning model to produce software dependency recommendations. The model can be periodically retrained to update its knowledge of available dependencies. The software dependencies can be incorporated into software by developers who receive the selection or automatically by an intelligent software development platform. A processing device can train the model by assembling sparse user data based on feedback corresponding to software dependencies to produce a vector of preferences for each user. The processing device can also generate a latent vector of attributes for each software dependency. The processing device can then apply matrix factorization to the vectors to produce a behavior matrix that is used to train the machine learning model.
    Type: Grant
    Filed: August 6, 2019
    Date of Patent: May 4, 2021
    Assignee: Red Hat, Inc.
    Inventors: Avishkar Gupta, Aagam Shah, Sarah Masud
  • Patent number: 10986014
    Abstract: A monitoring system detects a deviation in a monitoring metric of a system component of a remote management system that remotely manages image forming apparatuses. When the monitoring system detects a deviation in online device count greater than or equal to a deviation threshold and makes a determination that there is a correlation between the deviations in monitoring metrics of multiple system components as detected, the monitoring system sends a failure report indicating that a failure is in the remote management system.
    Type: Grant
    Filed: June 5, 2020
    Date of Patent: April 20, 2021
    Assignee: KYOCERA DOCUMENT SOLUTIONS INC.
    Inventors: Dukil Park, Kazuki Nishikai, Koki Nakajima, Yasuo Nakashima, Satoshi Goshima, Yuichi Obayashi, Takeshi Nakamura
  • Patent number: 10915318
    Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
    Type: Grant
    Filed: March 4, 2019
    Date of Patent: February 9, 2021
    Assignee: Google LLC
    Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
  • Patent number: 10897605
    Abstract: Apparatuses, systems, and methods related to an image processor formed in an array of memory cells are described. An image processor as described herein is configured to reduce complexity and power consumption and/or increase data access bandwidth by performing image processing in the array of memory cells relative to image processing by a host processor external to the memory array. For instance, one apparatus described herein includes sensor circuitry configured to provide an input vector, as a plurality of bits that corresponds to a plurality of color components for an image pixel, and an image processor formed in an array of memory cells. The image processor is coupled to the sensor circuitry to receive the plurality of bits of the input vector. The image processor is configured to perform a color correction operation in the array by performing matrix multiplication on the input vector and a parameter matrix to determine an output vector that is color corrected.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: January 19, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Fa-Long Luo, Jaime C. Cummins, Tamara Schmitz
  • Patent number: 10896039
    Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: January 19, 2021
    Assignee: Intel Corporation
    Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
  • Patent number: 10872130
    Abstract: Based on a Modified Gram-Schmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmetic-logic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floating-point (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.
    Type: Grant
    Filed: August 31, 2017
    Date of Patent: December 22, 2020
    Assignee: Intel Corporation
    Inventor: Martin Langhammer
  • Patent number: 10832799
    Abstract: Methods, systems and apparatus for detecting patterns in constituents of at least one biological organism are disclosed. In accordance with one method, clusters of the constituents are determined (208) by selecting (210) different subsets of at least one of genes or proteins and identifying (212) the clusters from biological data corresponding to the selected subsets. Here, membership values for the constituents, indicating membership within the clusters, are calculated for use as a basis of an additional cluster determination process (208) to obtain final clusters of constituents. By underpinning the preliminary clustering on different subsets of biological data and formulating the higher-level clustering on the basis of the membership values, the embodiments can enable an evaluation of a large variety of biological data in a practical, accurate and highly efficient manner.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: November 10, 2020
    Assignee: Koninklijke Philips N.V.
    Inventors: Konstantin Volyanskyy, Nevenka Dimitrova
  • Patent number: 10762164
    Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: September 1, 2020
    Assignee: Cambricon Technologies Corporation Limited
    Inventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
  • Patent number: 10762163
    Abstract: In embodiments of probabilistic matrix factorization for automated machine learning, a computing system memory maintains different workflows that each include preprocessing steps for a machine learning model, the machine learning model, and one or more parameters for the machine learning model. The computing system memory additionally maintains different data sets, upon which the different workflows can be trained and tested. A matrix is generated from the different workflows and different data sets, where cells of the matrix are populated with performance metrics that each indicate a measure of performance for a workflow applied to a data set. A low-rank decomposition of the matrix with populated performance metrics is then determined. Based on the low-rank decomposition, an optimum workflow for a new data set can be determined. The optimum workflow can be one of the different workflows or a hybrid of at least two of the different workflows.
    Type: Grant
    Filed: December 5, 2016
    Date of Patent: September 1, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Nicolo Fusi
  • Patent number: 10755426
    Abstract: An electronic device comprises circuitry implementing a depth map enhancer. The depth map enhancer obtains an initial depth map corresponding to a scene and an image of the scene. The depth map enhancer generates a refined depth map corresponding to the scene using an optimizer, the initial depth map and the image. The refined depth map includes estimated depth indicators corresponding to at least a first depth-information region, identified based at least in part on a first criterion, of the initial depth map. Input based on the refined depth map is provided to an image processing application.
    Type: Grant
    Filed: May 23, 2018
    Date of Patent: August 25, 2020
    Assignee: Apple Inc.
    Inventors: Mark Norman Lester Jouppi, Michael Wish Tao, Eric Bujold, Stephane Simon Rene Ben Soussan, Volker Roelke, Geoffrey T. Anneheim, Julio Cesar Hernandez Zaragoza, Florian Ciurea
  • Patent number: 10747846
    Abstract: Matrix processing includes: initializing a current matrix based at least in part on an original matrix; iteratively determining a matrix property using a plurality of iteration cycles, including, in an iteration cycle: partitioning the current matrix to obtain a plurality of partitions, wherein the plurality of partitions includes a submatrix; modifying the submatrix based at least in part on other partitions of the plurality of partitions to provide a current matrix for a next iteration; and continuing to iterate until a condition is met. Matrix processing further includes obtaining the matrix property from an iteration result; and outputting the matrix property.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 18, 2020
    Assignee: Cyber Atomics, Inc.
    Inventor: Roy Batruni
  • Patent number: 10743026
    Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: August 11, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yoon-mi Hong, Woo-jin Han, Min-su Cheon, Jianle Chen
  • Patent number: 10733140
    Abstract: A computer processor is disclosed. The computer processor may comprises a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that produce results with elements of widths different than that of the input elements. The computer processor may be implemented as a monolithic integrated circuit.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: August 4, 2020
    Assignee: OPTIMUM SEMICONDUCTOR TECHNOLOGIES INC.
    Inventors: Mayan Moudgill, Arthur Joseph Hoane, Paul Hurtley
  • Patent number: 10719323
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 21, 2020
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 10664552
    Abstract: An apparatus and method for LU decomposition of an input matrix, the input matrix comprising a multitude of elements forming a plurality of rows and columns. In an embodiment, the apparatus comprises a memory including a plurality of memory caches, and a processor unit operatively connected to the memory to transmit data to and receive data from the memory caches. The processing unit comprises a hardware circuit for processing the input matrix column-by-column to decompose the input matrix into a lower triangular matrix L and an upper triangular matrix U, including performing Gaussian eliminations on the columns of the matrix, with partial pivoting of the matrix, and choosing one of the elements of each of the columns as a pivot element for said each column while said each column is being processed.
    Type: Grant
    Filed: April 1, 2019
    Date of Patent: May 26, 2020
    Assignee: International Business Machines Corporation
    Inventors: Maysam Mir Ahmadi, Sean Wagner
  • Patent number: 10651951
    Abstract: Methods and apparatus for sub-block based architecture of Cholesky decomposition and channel whitening. In an exemplary embodiment, an apparatus is provided that parallel processes sub-block matrices (R00, R10, and R11) of a covariance matrix (R) to determine a whitening coefficient matrix (W). The apparatus includes a first LDL coefficient calculator that calculates a first whitening matrix W00, lower triangle matrix L00, and diagonal matrix D00 from the sub-block matrix R00, a first matrix calculator that calculates a lower triangle matrix L10 from the sub-block matrix R10 and the matrices L00 and D00, and a second matrix calculator that calculates a matrix X from the matrices D00 and L10.
    Type: Grant
    Filed: December 29, 2018
    Date of Patent: May 12, 2020
    Assignee: Cavium, LLC.
    Inventors: Yuanbin Guo, Hong Jik Kim
  • Patent number: 10635447
    Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: April 28, 2020
    Assignee: Intel Corporation
    Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10607718
    Abstract: Embodiments of the present invention include method, systems and computer program products for algebraic phasing of polyploids. Aspects of the invention include receiving a matrix including a set of two or more single-nucleotide poloymorphisms (SNPs) for two or more sample organisms. Each row of the matrix is set to a ploidy based on a number of ploidies present in the two or more sample organisms. Each allele in the set of two or more SNPs is represented as a binary number. A set of algebraic rules is received, wherein the set of algebraic rules include an algebraic phasing algorithm. And the set of algebraic rules are applied to the matrix to determine a haplotype of a parent of the two or more sample organisms.
    Type: Grant
    Filed: May 14, 2019
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORORATION
    Inventors: Laxmi P. Parida, Filippo Utro
  • Patent number: 10599428
    Abstract: Processing circuitry supports overlapped execution of vector instructions when at least one beat of a first vector instruction is performed in parallel with at least one beat of a second vector instruction. The processing circuitry also supports mixed-scalar-vector instructions for which one of a destination register and one or more source registers is a vector register and another is a scalar register. In a sequence including first and subsequent mixed-scalar-vector instructions, instances of relaxed execution which can potentially lead to uncertain and incorrect results are permitted by the processing circuitry when the instructions are separated by fewer than a predetermined number of intervening instructions. In practice the situations which lead to the uncertain results are very rare and so it is not justified providing relatively expensive dependency checking circuitry for eliminating such cases.
    Type: Grant
    Filed: March 23, 2016
    Date of Patent: March 24, 2020
    Assignee: ARM Limited
    Inventor: Thomas Christopher Grocutt
  • Patent number: 10592241
    Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a master computation module configured to receive a first matrix and transmit a row vector of the first matrix. In addition, the aspects may include one or more slave computation modules respectively configured to store a column vector of a second matrix, receive the row vector of the first matrix, and multiply the row vector of the first matrix with the stored column vector of the second matrix to generate a result element. Further, the aspects may include an interconnection unit configured to combine the one or more result elements generated respectively by the one or more slave computation modules to generate a row vector of a result matrix and transmit the row vector of the result matrix to the master computation module.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: March 17, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10579554
    Abstract: Program procedures executed to rout a bus, via a processing unit, include a bus information extractor configured to extract bus information including physical requirements for the bus, from input data, a buffer array generator configured to generate a buffer array in which buffers included in the bus are regularly arranged based on the bus information, a buffer array placer configured to place at least one buffer array in the layout of the integrated circuit based on the bus information, and a wiring procedure configured to generate interconnections connected to buffers included in the at least one buffer array based on the bus information.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: March 3, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Byung-yong Kim
  • Patent number: 10558730
    Abstract: A computing method includes: generating first partitioned matrices by partitioning the first matrix by a least common multiple of the M and the N in the row direction and by the N in the column direction; generating second partitioned matrices by partitioning the second matrix by the M in the row direction and by the least common multiple in the column direction; adding a first product of the first partitioned matrices and the second partitioned matrices to a first result matrix; transmitting the first partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the row direction; transmitting the second partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the column direction.
    Type: Grant
    Filed: February 13, 2018
    Date of Patent: February 11, 2020
    Assignee: FUJITSU LIMITED
    Inventor: Akihiko Kasagi
  • Patent number: 10534839
    Abstract: A method for matrix by vector multiplication, applied in an artificial neural network system, is disclosed. The method comprises: compressing a plurality of weight values in a weight matrix and indices of an input vector into a compressed main stream; storing M sets of synapse values in M memory devices; and, performing reading and MAC operations according to the M sets of synapse values and the compressed main stream to obtain a number M of output vectors. The step of compressing comprises: dividing the weight matrix into a plurality of N×L blocks; converting entries of a target block and corresponding indices of the input vector into a working block and an index matrix; removing zero entries in the working block; shifting non-zero entries row-by-row to one of their left and right sides in the working block; and, respectively shifting corresponding entries in the index matrix.
    Type: Grant
    Filed: June 25, 2018
    Date of Patent: January 14, 2020
    Assignee: BRITISH CAYMAN ISLANDS INTELLIGO TECHNOLOGY INC.
    Inventors: Pei-Wen Hsieh, Chen-Chu Hsu, Tsung-Liang Chen
  • Patent number: 10534838
    Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: January 14, 2020
    Assignee: Intel Corporation
    Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
  • Patent number: 10521228
    Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.
    Type: Grant
    Filed: November 7, 2018
    Date of Patent: December 31, 2019
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
  • Patent number: 10489063
    Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.
    Type: Grant
    Filed: December 19, 2016
    Date of Patent: November 26, 2019
    Assignee: Intel Corporation
    Inventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
  • Patent number: 10481957
    Abstract: A processor and a task processing method therefor, and a storage medium. The method comprises: a scalar calculation module executing parameter calculation of a current task, and storing a parameter obtained through calculation in a PBUF; when the parameter calculation of the current task is completed, executing a first instruction or second instruction for inter-core synchronization, and storing the first instruction or the second instruction in the PBUF (301); a vector calculation module reading the parameter from the PBUF, storing the read parameter in a shadow register; when the first instruction or the second instruction is read from the PBUF, storing all the modified parameters in the shadow register in a work register within a period (302); and the vector calculation module executing vector calculation of the current task according to the parameter in the work register (303).
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: November 19, 2019
    Assignee: Sanechips Technology Co., Ltd.
    Inventors: Bo Wen, Qingxin Cao
  • Patent number: 10482157
    Abstract: A data compression apparatus includes a memory; and a processor configured to generate compressed matrix data, compare a threshold and an index value calculated about a specific value data string that is a data string obtained by coupling specific values specified from element values that are not zero values in each row of the compressed matrix data, specify a given constant as respective coefficients when the index value is larger than the threshold, calculate reciprocals of respective specific values as the respective coefficients when the index value is equal to or smaller than the threshold, and output matrix data after operation that is a result of rounding based on the number of places of significant figures of a decimal part in each element that corresponds about products of respective elements of the compressed matrix data and the respective coefficients calculated, regarding the respective elements of the compressed matrix data.
    Type: Grant
    Filed: February 27, 2019
    Date of Patent: November 19, 2019
    Assignee: FUJITSU LIMITED
    Inventor: Makiko Konoshima
  • Patent number: 10455252
    Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.
    Type: Grant
    Filed: July 2, 2018
    Date of Patent: October 22, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
  • Patent number: 10395381
    Abstract: Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.
    Type: Grant
    Filed: March 4, 2019
    Date of Patent: August 27, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Jayasree Sankaranarayanan, Dipan Kumar Mandal
  • Patent number: 10346507
    Abstract: Embodiments of the present invention are directed to methods and systems for performing block sparse matrix-vector multiplications with improved efficiency through the use of a specific re-ordering the matrix data such that matrix symmetry can be exploited while simultaneously avoiding atomic memory operations or the need for inefficient memory operations in general. One disclosed method includes reordering the matrix data such that, for any column of non-transpose data, and for any row of transpose data simultaneously processed within a single thread-block on a GPU, all matrix elements update independent elements of the output vector. Using the method, the amount of data required to represent the sparse matrix can be reduced by as much as 50%, thereby doubling the effective performance on the GPU, and doubling the size of the matrix that can be accelerated by the GPU.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: July 9, 2019
    Assignee: Nvidia Corporation
    Inventor: Steve Rennich
  • Patent number: 10310812
    Abstract: Mechanisms are provided for performing a matrix operation. A processor of a data processing system is configured to perform cluster-based matrix reordering of an input matrix. An input matrix, which comprises nodes associated with elements of the matrix, is received. The nodes are clustered into clusters based on numbers of connections with other nodes within and between the clusters, and the clusters are ordered by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix. A lookup table is generated identifying new locations of nodes of the input matrix, in the reordered matrix. A matrix operation is then performed based on the reordered matrix and the lookup table.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: June 4, 2019
    Assignee: International Business Machines Corporation
    Inventors: Emrah Acar, Rajesh R. Bordawekar, Michele M. Franceschini, Luis A. Lastras-Montano, Ruchir Puri, Haifeng Qian, Livio B. Soares
  • Patent number: 10304008
    Abstract: Systems and methods are disclosed for operating a machine, by receiving training data from one or more sensors; training a machine learning module with the training data by: partitioning a data matrix into smaller submatrices to process in parallel and optimized for each processing node; for each submatrix, performing a greedy search for rank-one solutions; using alternating direction method of multipliers (ADMM) to ensure consistency over different data blocks; and controlling one or more actuators using live data and the learned module during operation.
    Type: Grant
    Filed: March 7, 2016
    Date of Patent: May 28, 2019
    Assignee: NEC Corporation
    Inventors: Renqiang Min, Dongjin Song
  • Patent number: 10275392
    Abstract: A data processing device includes a two-dimensional structure including a plurality of stages in a vertical direction, the stages each including basic units in a horizontal direction such that the number of the basic units is equal to the number of ways. The basic units each includes a memory block having a plurality of ports, an address generator for the ports of the memory block, and a calculation unit.
    Type: Grant
    Filed: April 6, 2016
    Date of Patent: April 30, 2019
    Assignee: NATIONAL UNIVERSITY CORPORATION NARA INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Yasuhiko Nakashima, Shinya Takamaeda
  • Patent number: 10248426
    Abstract: Techniques are disclosed for restoring register data in a processor. In one embodiment, a method includes receiving an instruction to flush one or more general purpose registers (GPRs) in a processor. The method also includes determining history buffer entries of a history buffer to be restored to the one or more GPRs. The method includes creating a mask vector that indicates which history buffer entries will be restored to the one or more GPRs. The method further includes restoring the indicated history buffer entries to the one or more GPRs. As each indicated history buffer entry is restored, the method includes updating the mask vector to indicate which history buffer entries have been restored.
    Type: Grant
    Filed: May 24, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Steven J. Battle, Joshua W. Bowman, Christopher M. Mueller, Dung Q. Nguyen, David R. Terry, Eula Faye Tolentino, Jing Zhang
  • Patent number: 10191749
    Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: January 29, 2019
    Assignee: Intel Corporation
    Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10191744
    Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: January 29, 2019
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Igor Ermolaev
  • Patent number: 10169239
    Abstract: A prefetch request having a priority assigned thereto is obtained, based on executing a prefetch instruction included within a program. Based on obtaining the prefetch request, a determination is made as to whether the prefetch request may be placed on a prefetch queue. This determination includes determining whether the prefetch queue is full; checking, based on determining the prefetch queue is full, whether the priority of the prefetch request is considered a high priority; determining, based on the checking indicating the priority of the prefetch request is considered a high priority, whether another prefetch request on the prefetch queue may be removed; removing the other prefetch request from the prefetch queue, based on determining the other prefetch request may be removed; and adding the prefetch request to the prefetch queue, based on removing the other prefetch request.
    Type: Grant
    Filed: July 20, 2016
    Date of Patent: January 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Michael K. Gschwind, Christian Jacobi, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 10162752
    Abstract: A method for storing data at contiguous memory addresses includes, at a single-instruction-multiple-data (SIMD) processor, executing a parallel-prefix valid count instruction to determine a first offset of a first data vector and to determine a second offset of a second data vector that includes valid data and invalid data. The second offset is based on the first offset and a number of positions in the first data vector that are associated with valid data. The method also includes storing first valid data from the first data vector at a first memory address of a memory and storing second valid data from the second data vector at a particular memory address of the memory. The first memory address is based on the first offset and the particular memory address is based on the second offset.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: December 25, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Mahurin, David Hoyle
  • Patent number: 10146740
    Abstract: A computer implemented method is provided for processing sparse data. A sparse data set is received. A modified sparse data set is calculated by replacing all nonzero values in the sparse data set with a common positive integer. The modified sparse data set is transposed to create a transposed data set. A covariance matrix is calculated by multiplying the transposed data set by the modified sparse data set. A tree of a predefined depth is generated by assigning columns of the sparse data set to right and left nodes based on co-occurrence with a first anchor column and a second anchor column. The first anchor column and the second anchor column are determined based on the covariance matrix.
    Type: Grant
    Filed: March 8, 2017
    Date of Patent: December 4, 2018
    Assignee: Symantec Corporation
    Inventors: Nikolaos Vasiloglou, Andrew B. Gardner
  • Patent number: 10097834
    Abstract: A method of encoding image data, including: frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision; and selecting the maximum dynamic range and/or the data precision of the transform matrices according to the bit depth of the input image data.
    Type: Grant
    Filed: April 4, 2014
    Date of Patent: October 9, 2018
    Assignee: Sony Corporation
    Inventors: David Berry, James Alexander Gamei, Nicholas Ian Saunders, Karl James Sharman
  • Patent number: 10042814
    Abstract: A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: August 7, 2018
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Andrew T. Forsyth, Michael Abrash
  • Patent number: 10038918
    Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.
    Type: Grant
    Filed: September 11, 2017
    Date of Patent: July 31, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
  • Patent number: 9984041
    Abstract: A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: May 29, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Minsik Cho, David Shing-ki Kung, Ruchir Puri
  • Patent number: 9934195
    Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.
    Type: Grant
    Filed: November 28, 2012
    Date of Patent: April 3, 2018
    Assignee: Mediatek Sweden AB
    Inventors: Anders Nilsson, Eric Tell
  • Patent number: 9870338
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: January 16, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
  • Patent number: 9858079
    Abstract: A method and system are described for generating reference tables in object code which specify the addresses of branches, routines called, and data references used by routines in the code. In a suitably equipped processing system, the reference tables can be passed to a memory management processor which can open the appropriate memory pages to expedite the retrieval of data referenced in the execution pipeline. The disclosed method and system create such reference tables at the beginning of each routine so that the table can be passed to the memory management processor in a suitably equipped processor. Resulting object code also allows processors lacking a suitable memory management processor to skip the reference table, preserving upward compatibility.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: January 2, 2018
    Assignee: Micron Technology, Inc.
    Inventor: Dean A. Klein
  • Patent number: 9846581
    Abstract: A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: December 19, 2017
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Tao Huang, Yiqun Ge, Qifan Zhang, Wuxian Shi, Wen Tong