Matrix Array Patents (Class 708/520)
-
Patent number: 11023242Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.Type: GrantFiled: January 27, 2017Date of Patent: June 1, 2021Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani
-
Patent number: 11017290Abstract: A signal processing module comprises at least one operational unit incorporating computation units, input and output interfaces able to be linked to a bus and a memory storing data destined for the computation units, the memory being organized so that each data word is stored column-wise over several addresses according to an order dependent on the application, a column having a width of one bit, the words being transferred in series to the computation units.Type: GrantFiled: November 27, 2014Date of Patent: May 25, 2021Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVESInventors: Marc Duranton, Jean-Marc Philippe
-
Patent number: 10996944Abstract: A processing device can establish a machine learning model to produce software dependency recommendations. The model can be periodically retrained to update its knowledge of available dependencies. The software dependencies can be incorporated into software by developers who receive the selection or automatically by an intelligent software development platform. A processing device can train the model by assembling sparse user data based on feedback corresponding to software dependencies to produce a vector of preferences for each user. The processing device can also generate a latent vector of attributes for each software dependency. The processing device can then apply matrix factorization to the vectors to produce a behavior matrix that is used to train the machine learning model.Type: GrantFiled: August 6, 2019Date of Patent: May 4, 2021Assignee: Red Hat, Inc.Inventors: Avishkar Gupta, Aagam Shah, Sarah Masud
-
Patent number: 10986014Abstract: A monitoring system detects a deviation in a monitoring metric of a system component of a remote management system that remotely manages image forming apparatuses. When the monitoring system detects a deviation in online device count greater than or equal to a deviation threshold and makes a determination that there is a correlation between the deviations in monitoring metrics of multiple system components as detected, the monitoring system sends a failure report indicating that a failure is in the remote management system.Type: GrantFiled: June 5, 2020Date of Patent: April 20, 2021Assignee: KYOCERA DOCUMENT SOLUTIONS INC.Inventors: Dukil Park, Kazuki Nishikai, Koki Nakajima, Yasuo Nakashima, Satoshi Goshima, Yuichi Obayashi, Takeshi Nakamura
-
Patent number: 10915318Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: GrantFiled: March 4, 2019Date of Patent: February 9, 2021Assignee: Google LLCInventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
-
Patent number: 10897605Abstract: Apparatuses, systems, and methods related to an image processor formed in an array of memory cells are described. An image processor as described herein is configured to reduce complexity and power consumption and/or increase data access bandwidth by performing image processing in the array of memory cells relative to image processing by a host processor external to the memory array. For instance, one apparatus described herein includes sensor circuitry configured to provide an input vector, as a plurality of bits that corresponds to a plurality of color components for an image pixel, and an image processor formed in an array of memory cells. The image processor is coupled to the sensor circuitry to receive the plurality of bits of the input vector. The image processor is configured to perform a color correction operation in the array by performing matrix multiplication on the input vector and a parameter matrix to determine an output vector that is color corrected.Type: GrantFiled: August 26, 2019Date of Patent: January 19, 2021Assignee: Micron Technology, Inc.Inventors: Fa-Long Luo, Jaime C. Cummins, Tamara Schmitz
-
Patent number: 10896039Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.Type: GrantFiled: January 31, 2019Date of Patent: January 19, 2021Assignee: Intel CorporationInventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
-
Patent number: 10872130Abstract: Based on a Modified Gram-Schmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmetic-logic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floating-point (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.Type: GrantFiled: August 31, 2017Date of Patent: December 22, 2020Assignee: Intel CorporationInventor: Martin Langhammer
-
Patent number: 10832799Abstract: Methods, systems and apparatus for detecting patterns in constituents of at least one biological organism are disclosed. In accordance with one method, clusters of the constituents are determined (208) by selecting (210) different subsets of at least one of genes or proteins and identifying (212) the clusters from biological data corresponding to the selected subsets. Here, membership values for the constituents, indicating membership within the clusters, are calculated for use as a basis of an additional cluster determination process (208) to obtain final clusters of constituents. By underpinning the preliminary clustering on different subsets of biological data and formulating the higher-level clustering on the basis of the membership values, the embodiments can enable an evaluation of a large variety of biological data in a practical, accurate and highly efficient manner.Type: GrantFiled: August 12, 2016Date of Patent: November 10, 2020Assignee: Koninklijke Philips N.V.Inventors: Konstantin Volyanskyy, Nevenka Dimitrova
-
Patent number: 10762164Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.Type: GrantFiled: July 19, 2018Date of Patent: September 1, 2020Assignee: Cambricon Technologies Corporation LimitedInventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
-
Patent number: 10762163Abstract: In embodiments of probabilistic matrix factorization for automated machine learning, a computing system memory maintains different workflows that each include preprocessing steps for a machine learning model, the machine learning model, and one or more parameters for the machine learning model. The computing system memory additionally maintains different data sets, upon which the different workflows can be trained and tested. A matrix is generated from the different workflows and different data sets, where cells of the matrix are populated with performance metrics that each indicate a measure of performance for a workflow applied to a data set. A low-rank decomposition of the matrix with populated performance metrics is then determined. Based on the low-rank decomposition, an optimum workflow for a new data set can be determined. The optimum workflow can be one of the different workflows or a hybrid of at least two of the different workflows.Type: GrantFiled: December 5, 2016Date of Patent: September 1, 2020Assignee: Microsoft Technology Licensing, LLCInventor: Nicolo Fusi
-
Patent number: 10755426Abstract: An electronic device comprises circuitry implementing a depth map enhancer. The depth map enhancer obtains an initial depth map corresponding to a scene and an image of the scene. The depth map enhancer generates a refined depth map corresponding to the scene using an optimizer, the initial depth map and the image. The refined depth map includes estimated depth indicators corresponding to at least a first depth-information region, identified based at least in part on a first criterion, of the initial depth map. Input based on the refined depth map is provided to an image processing application.Type: GrantFiled: May 23, 2018Date of Patent: August 25, 2020Assignee: Apple Inc.Inventors: Mark Norman Lester Jouppi, Michael Wish Tao, Eric Bujold, Stephane Simon Rene Ben Soussan, Volker Roelke, Geoffrey T. Anneheim, Julio Cesar Hernandez Zaragoza, Florian Ciurea
-
Patent number: 10747846Abstract: Matrix processing includes: initializing a current matrix based at least in part on an original matrix; iteratively determining a matrix property using a plurality of iteration cycles, including, in an iteration cycle: partitioning the current matrix to obtain a plurality of partitions, wherein the plurality of partitions includes a submatrix; modifying the submatrix based at least in part on other partitions of the plurality of partitions to provide a current matrix for a next iteration; and continuing to iterate until a condition is met. Matrix processing further includes obtaining the matrix property from an iteration result; and outputting the matrix property.Type: GrantFiled: September 25, 2019Date of Patent: August 18, 2020Assignee: Cyber Atomics, Inc.Inventor: Roy Batruni
-
Patent number: 10743026Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.Type: GrantFiled: September 5, 2019Date of Patent: August 11, 2020Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yoon-mi Hong, Woo-jin Han, Min-su Cheon, Jianle Chen
-
Patent number: 10733140Abstract: A computer processor is disclosed. The computer processor may comprises a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that produce results with elements of widths different than that of the input elements. The computer processor may be implemented as a monolithic integrated circuit.Type: GrantFiled: June 1, 2015Date of Patent: August 4, 2020Assignee: OPTIMUM SEMICONDUCTOR TECHNOLOGIES INC.Inventors: Mayan Moudgill, Arthur Joseph Hoane, Paul Hurtley
-
Patent number: 10719323Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.Type: GrantFiled: September 27, 2018Date of Patent: July 21, 2020Assignee: Intel CorporationInventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
-
Patent number: 10664552Abstract: An apparatus and method for LU decomposition of an input matrix, the input matrix comprising a multitude of elements forming a plurality of rows and columns. In an embodiment, the apparatus comprises a memory including a plurality of memory caches, and a processor unit operatively connected to the memory to transmit data to and receive data from the memory caches. The processing unit comprises a hardware circuit for processing the input matrix column-by-column to decompose the input matrix into a lower triangular matrix L and an upper triangular matrix U, including performing Gaussian eliminations on the columns of the matrix, with partial pivoting of the matrix, and choosing one of the elements of each of the columns as a pivot element for said each column while said each column is being processed.Type: GrantFiled: April 1, 2019Date of Patent: May 26, 2020Assignee: International Business Machines CorporationInventors: Maysam Mir Ahmadi, Sean Wagner
-
Patent number: 10651951Abstract: Methods and apparatus for sub-block based architecture of Cholesky decomposition and channel whitening. In an exemplary embodiment, an apparatus is provided that parallel processes sub-block matrices (R00, R10, and R11) of a covariance matrix (R) to determine a whitening coefficient matrix (W). The apparatus includes a first LDL coefficient calculator that calculates a first whitening matrix W00, lower triangle matrix L00, and diagonal matrix D00 from the sub-block matrix R00, a first matrix calculator that calculates a lower triangle matrix L10 from the sub-block matrix R10 and the matrices L00 and D00, and a second matrix calculator that calculates a matrix X from the matrices D00 and L10.Type: GrantFiled: December 29, 2018Date of Patent: May 12, 2020Assignee: Cavium, LLC.Inventors: Yuanbin Guo, Hong Jik Kim
-
Patent number: 10635447Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.Type: GrantFiled: December 20, 2018Date of Patent: April 28, 2020Assignee: Intel CorporationInventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10607718Abstract: Embodiments of the present invention include method, systems and computer program products for algebraic phasing of polyploids. Aspects of the invention include receiving a matrix including a set of two or more single-nucleotide poloymorphisms (SNPs) for two or more sample organisms. Each row of the matrix is set to a ploidy based on a number of ploidies present in the two or more sample organisms. Each allele in the set of two or more SNPs is represented as a binary number. A set of algebraic rules is received, wherein the set of algebraic rules include an algebraic phasing algorithm. And the set of algebraic rules are applied to the matrix to determine a haplotype of a parent of the two or more sample organisms.Type: GrantFiled: May 14, 2019Date of Patent: March 31, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORORATIONInventors: Laxmi P. Parida, Filippo Utro
-
Patent number: 10599428Abstract: Processing circuitry supports overlapped execution of vector instructions when at least one beat of a first vector instruction is performed in parallel with at least one beat of a second vector instruction. The processing circuitry also supports mixed-scalar-vector instructions for which one of a destination register and one or more source registers is a vector register and another is a scalar register. In a sequence including first and subsequent mixed-scalar-vector instructions, instances of relaxed execution which can potentially lead to uncertain and incorrect results are permitted by the processing circuitry when the instructions are separated by fewer than a predetermined number of intervening instructions. In practice the situations which lead to the uncertain results are very rare and so it is not justified providing relatively expensive dependency checking circuitry for eliminating such cases.Type: GrantFiled: March 23, 2016Date of Patent: March 24, 2020Assignee: ARM LimitedInventor: Thomas Christopher Grocutt
-
Patent number: 10592241Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a master computation module configured to receive a first matrix and transmit a row vector of the first matrix. In addition, the aspects may include one or more slave computation modules respectively configured to store a column vector of a second matrix, receive the row vector of the first matrix, and multiply the row vector of the first matrix with the stored column vector of the second matrix to generate a result element. Further, the aspects may include an interconnection unit configured to combine the one or more result elements generated respectively by the one or more slave computation modules to generate a row vector of a result matrix and transmit the row vector of the result matrix to the master computation module.Type: GrantFiled: October 25, 2018Date of Patent: March 17, 2020Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITEDInventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
-
Patent number: 10579554Abstract: Program procedures executed to rout a bus, via a processing unit, include a bus information extractor configured to extract bus information including physical requirements for the bus, from input data, a buffer array generator configured to generate a buffer array in which buffers included in the bus are regularly arranged based on the bus information, a buffer array placer configured to place at least one buffer array in the layout of the integrated circuit based on the bus information, and a wiring procedure configured to generate interconnections connected to buffers included in the at least one buffer array based on the bus information.Type: GrantFiled: June 8, 2017Date of Patent: March 3, 2020Assignee: Samsung Electronics Co., Ltd.Inventor: Byung-yong Kim
-
Patent number: 10558730Abstract: A computing method includes: generating first partitioned matrices by partitioning the first matrix by a least common multiple of the M and the N in the row direction and by the N in the column direction; generating second partitioned matrices by partitioning the second matrix by the M in the row direction and by the least common multiple in the column direction; adding a first product of the first partitioned matrices and the second partitioned matrices to a first result matrix; transmitting the first partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the row direction; transmitting the second partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the column direction.Type: GrantFiled: February 13, 2018Date of Patent: February 11, 2020Assignee: FUJITSU LIMITEDInventor: Akihiko Kasagi
-
Patent number: 10534839Abstract: A method for matrix by vector multiplication, applied in an artificial neural network system, is disclosed. The method comprises: compressing a plurality of weight values in a weight matrix and indices of an input vector into a compressed main stream; storing M sets of synapse values in M memory devices; and, performing reading and MAC operations according to the M sets of synapse values and the compressed main stream to obtain a number M of output vectors. The step of compressing comprises: dividing the weight matrix into a plurality of N×L blocks; converting entries of a target block and corresponding indices of the input vector into a working block and an index matrix; removing zero entries in the working block; shifting non-zero entries row-by-row to one of their left and right sides in the working block; and, respectively shifting corresponding entries in the index matrix.Type: GrantFiled: June 25, 2018Date of Patent: January 14, 2020Assignee: BRITISH CAYMAN ISLANDS INTELLIGO TECHNOLOGY INC.Inventors: Pei-Wen Hsieh, Chen-Chu Hsu, Tsung-Liang Chen
-
Patent number: 10534838Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.Type: GrantFiled: September 29, 2017Date of Patent: January 14, 2020Assignee: Intel CorporationInventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
-
Patent number: 10521228Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.Type: GrantFiled: November 7, 2018Date of Patent: December 31, 2019Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITEDInventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
-
Patent number: 10489063Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.Type: GrantFiled: December 19, 2016Date of Patent: November 26, 2019Assignee: Intel CorporationInventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
-
Patent number: 10481957Abstract: A processor and a task processing method therefor, and a storage medium. The method comprises: a scalar calculation module executing parameter calculation of a current task, and storing a parameter obtained through calculation in a PBUF; when the parameter calculation of the current task is completed, executing a first instruction or second instruction for inter-core synchronization, and storing the first instruction or the second instruction in the PBUF (301); a vector calculation module reading the parameter from the PBUF, storing the read parameter in a shadow register; when the first instruction or the second instruction is read from the PBUF, storing all the modified parameters in the shadow register in a work register within a period (302); and the vector calculation module executing vector calculation of the current task according to the parameter in the work register (303).Type: GrantFiled: July 1, 2016Date of Patent: November 19, 2019Assignee: Sanechips Technology Co., Ltd.Inventors: Bo Wen, Qingxin Cao
-
Patent number: 10482157Abstract: A data compression apparatus includes a memory; and a processor configured to generate compressed matrix data, compare a threshold and an index value calculated about a specific value data string that is a data string obtained by coupling specific values specified from element values that are not zero values in each row of the compressed matrix data, specify a given constant as respective coefficients when the index value is larger than the threshold, calculate reciprocals of respective specific values as the respective coefficients when the index value is equal to or smaller than the threshold, and output matrix data after operation that is a result of rounding based on the number of places of significant figures of a decimal part in each element that corresponds about products of respective elements of the compressed matrix data and the respective coefficients calculated, regarding the respective elements of the compressed matrix data.Type: GrantFiled: February 27, 2019Date of Patent: November 19, 2019Assignee: FUJITSU LIMITEDInventor: Makiko Konoshima
-
Patent number: 10455252Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.Type: GrantFiled: July 2, 2018Date of Patent: October 22, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
-
Patent number: 10395381Abstract: Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.Type: GrantFiled: March 4, 2019Date of Patent: August 27, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Jayasree Sankaranarayanan, Dipan Kumar Mandal
-
Patent number: 10346507Abstract: Embodiments of the present invention are directed to methods and systems for performing block sparse matrix-vector multiplications with improved efficiency through the use of a specific re-ordering the matrix data such that matrix symmetry can be exploited while simultaneously avoiding atomic memory operations or the need for inefficient memory operations in general. One disclosed method includes reordering the matrix data such that, for any column of non-transpose data, and for any row of transpose data simultaneously processed within a single thread-block on a GPU, all matrix elements update independent elements of the output vector. Using the method, the amount of data required to represent the sparse matrix can be reduced by as much as 50%, thereby doubling the effective performance on the GPU, and doubling the size of the matrix that can be accelerated by the GPU.Type: GrantFiled: October 26, 2017Date of Patent: July 9, 2019Assignee: Nvidia CorporationInventor: Steve Rennich
-
Patent number: 10310812Abstract: Mechanisms are provided for performing a matrix operation. A processor of a data processing system is configured to perform cluster-based matrix reordering of an input matrix. An input matrix, which comprises nodes associated with elements of the matrix, is received. The nodes are clustered into clusters based on numbers of connections with other nodes within and between the clusters, and the clusters are ordered by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix. A lookup table is generated identifying new locations of nodes of the input matrix, in the reordered matrix. A matrix operation is then performed based on the reordered matrix and the lookup table.Type: GrantFiled: February 6, 2017Date of Patent: June 4, 2019Assignee: International Business Machines CorporationInventors: Emrah Acar, Rajesh R. Bordawekar, Michele M. Franceschini, Luis A. Lastras-Montano, Ruchir Puri, Haifeng Qian, Livio B. Soares
-
Patent number: 10304008Abstract: Systems and methods are disclosed for operating a machine, by receiving training data from one or more sensors; training a machine learning module with the training data by: partitioning a data matrix into smaller submatrices to process in parallel and optimized for each processing node; for each submatrix, performing a greedy search for rank-one solutions; using alternating direction method of multipliers (ADMM) to ensure consistency over different data blocks; and controlling one or more actuators using live data and the learned module during operation.Type: GrantFiled: March 7, 2016Date of Patent: May 28, 2019Assignee: NEC CorporationInventors: Renqiang Min, Dongjin Song
-
Patent number: 10275392Abstract: A data processing device includes a two-dimensional structure including a plurality of stages in a vertical direction, the stages each including basic units in a horizontal direction such that the number of the basic units is equal to the number of ways. The basic units each includes a memory block having a plurality of ports, an address generator for the ports of the memory block, and a calculation unit.Type: GrantFiled: April 6, 2016Date of Patent: April 30, 2019Assignee: NATIONAL UNIVERSITY CORPORATION NARA INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Yasuhiko Nakashima, Shinya Takamaeda
-
Patent number: 10248426Abstract: Techniques are disclosed for restoring register data in a processor. In one embodiment, a method includes receiving an instruction to flush one or more general purpose registers (GPRs) in a processor. The method also includes determining history buffer entries of a history buffer to be restored to the one or more GPRs. The method includes creating a mask vector that indicates which history buffer entries will be restored to the one or more GPRs. The method further includes restoring the indicated history buffer entries to the one or more GPRs. As each indicated history buffer entry is restored, the method includes updating the mask vector to indicate which history buffer entries have been restored.Type: GrantFiled: May 24, 2016Date of Patent: April 2, 2019Assignee: International Business Machines CorporationInventors: Brian D. Barrick, Steven J. Battle, Joshua W. Bowman, Christopher M. Mueller, Dung Q. Nguyen, David R. Terry, Eula Faye Tolentino, Jing Zhang
-
Patent number: 10191749Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.Type: GrantFiled: December 24, 2015Date of Patent: January 29, 2019Assignee: Intel CorporationInventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10191744Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.Type: GrantFiled: July 1, 2016Date of Patent: January 29, 2019Assignee: Intel CorporationInventors: Mikhail Plotnikov, Igor Ermolaev
-
Patent number: 10169239Abstract: A prefetch request having a priority assigned thereto is obtained, based on executing a prefetch instruction included within a program. Based on obtaining the prefetch request, a determination is made as to whether the prefetch request may be placed on a prefetch queue. This determination includes determining whether the prefetch queue is full; checking, based on determining the prefetch queue is full, whether the priority of the prefetch request is considered a high priority; determining, based on the checking indicating the priority of the prefetch request is considered a high priority, whether another prefetch request on the prefetch queue may be removed; removing the other prefetch request from the prefetch queue, based on determining the other prefetch request may be removed; and adding the prefetch request to the prefetch queue, based on removing the other prefetch request.Type: GrantFiled: July 20, 2016Date of Patent: January 1, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dan F. Greiner, Michael K. Gschwind, Christian Jacobi, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
-
Patent number: 10162752Abstract: A method for storing data at contiguous memory addresses includes, at a single-instruction-multiple-data (SIMD) processor, executing a parallel-prefix valid count instruction to determine a first offset of a first data vector and to determine a second offset of a second data vector that includes valid data and invalid data. The second offset is based on the first offset and a number of positions in the first data vector that are associated with valid data. The method also includes storing first valid data from the first data vector at a first memory address of a memory and storing second valid data from the second data vector at a particular memory address of the memory. The first memory address is based on the first offset and the particular memory address is based on the second offset.Type: GrantFiled: September 22, 2016Date of Patent: December 25, 2018Assignee: QUALCOMM IncorporatedInventors: Eric Mahurin, David Hoyle
-
Patent number: 10146740Abstract: A computer implemented method is provided for processing sparse data. A sparse data set is received. A modified sparse data set is calculated by replacing all nonzero values in the sparse data set with a common positive integer. The modified sparse data set is transposed to create a transposed data set. A covariance matrix is calculated by multiplying the transposed data set by the modified sparse data set. A tree of a predefined depth is generated by assigning columns of the sparse data set to right and left nodes based on co-occurrence with a first anchor column and a second anchor column. The first anchor column and the second anchor column are determined based on the covariance matrix.Type: GrantFiled: March 8, 2017Date of Patent: December 4, 2018Assignee: Symantec CorporationInventors: Nikolaos Vasiloglou, Andrew B. Gardner
-
Patent number: 10097834Abstract: A method of encoding image data, including: frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision; and selecting the maximum dynamic range and/or the data precision of the transform matrices according to the bit depth of the input image data.Type: GrantFiled: April 4, 2014Date of Patent: October 9, 2018Assignee: Sony CorporationInventors: David Berry, James Alexander Gamei, Nicholas Ian Saunders, Karl James Sharman
-
Patent number: 10042814Abstract: A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.Type: GrantFiled: November 14, 2014Date of Patent: August 7, 2018Assignee: Intel CorporationInventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Andrew T. Forsyth, Michael Abrash
-
Patent number: 10038918Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.Type: GrantFiled: September 11, 2017Date of Patent: July 31, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
-
Patent number: 9984041Abstract: A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.Type: GrantFiled: June 30, 2016Date of Patent: May 29, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Minsik Cho, David Shing-ki Kung, Ruchir Puri
-
Patent number: 9934195Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.Type: GrantFiled: November 28, 2012Date of Patent: April 3, 2018Assignee: Mediatek Sweden ABInventors: Anders Nilsson, Eric Tell
-
Patent number: 9870338Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.Type: GrantFiled: December 23, 2011Date of Patent: January 16, 2018Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
-
Patent number: 9858079Abstract: A method and system are described for generating reference tables in object code which specify the addresses of branches, routines called, and data references used by routines in the code. In a suitably equipped processing system, the reference tables can be passed to a memory management processor which can open the appropriate memory pages to expedite the retrieval of data referenced in the execution pipeline. The disclosed method and system create such reference tables at the beginning of each routine so that the table can be passed to the memory management processor in a suitably equipped processor. Resulting object code also allows processors lacking a suitable memory management processor to skip the reference table, preserving upward compatibility.Type: GrantFiled: October 19, 2015Date of Patent: January 2, 2018Assignee: Micron Technology, Inc.Inventor: Dean A. Klein
-
Patent number: 9846581Abstract: A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.Type: GrantFiled: September 8, 2014Date of Patent: December 19, 2017Assignee: Huawei Technologies Co., Ltd.Inventors: Tao Huang, Yiqun Ge, Qifan Zhang, Wuxian Shi, Wen Tong