Matrix Array Patents (Class 708/520)

Method and apparatus for asynchronous scheduling

Patent number: 11023242

Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.

Type: Grant

Filed: January 27, 2017

Date of Patent: June 1, 2021

Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.

Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani
Signal processing module, especially for a neural network and a neuronal circuit

Patent number: 11017290

Abstract: A signal processing module comprises at least one operational unit incorporating computation units, input and output interfaces able to be linked to a bus and a memory storing data destined for the computation units, the memory being organized so that each data word is stored column-wise over several addresses according to an order dependent on the application, a column having a width of one bit, the words being transferred in series to the computation units.

Type: Grant

Filed: November 27, 2014

Date of Patent: May 25, 2021

Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES

Inventors: Marc Duranton, Jean-Marc Philippe
Automated software selection using matrix factorization

Patent number: 10996944

Abstract: A processing device can establish a machine learning model to produce software dependency recommendations. The model can be periodically retrained to update its knowledge of available dependencies. The software dependencies can be incorporated into software by developers who receive the selection or automatically by an intelligent software development platform. A processing device can train the model by assembling sparse user data based on feedback corresponding to software dependencies to produce a vector of preferences for each user. The processing device can also generate a latent vector of attributes for each software dependency. The processing device can then apply matrix factorization to the vectors to produce a behavior matrix that is used to train the machine learning model.

Type: Grant

Filed: August 6, 2019

Date of Patent: May 4, 2021

Assignee: Red Hat, Inc.

Inventors: Avishkar Gupta, Aagam Shah, Sarah Masud
Monitoring system and non-transitory computer-readable recording medium storing monitoring program

Patent number: 10986014

Abstract: A monitoring system detects a deviation in a monitoring metric of a system component of a remote management system that remotely manages image forming apparatuses. When the monitoring system detects a deviation in online device count greater than or equal to a deviation threshold and makes a determination that there is a correlation between the deviations in monitoring metrics of multiple system components as detected, the monitoring system sends a failure report indicating that a failure is in the remote management system.

Type: Grant

Filed: June 5, 2020

Date of Patent: April 20, 2021

Assignee: KYOCERA DOCUMENT SOLUTIONS INC.

Inventors: Dukil Park, Kazuki Nishikai, Koki Nakajima, Yasuo Nakashima, Satoshi Goshima, Yuichi Obayashi, Takeshi Nakamura
Vector processing unit

Patent number: 10915318

Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.

Type: Grant

Filed: March 4, 2019

Date of Patent: February 9, 2021

Assignee: Google LLC

Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
Image processor formed in an array of memory cells

Patent number: 10897605

Abstract: Apparatuses, systems, and methods related to an image processor formed in an array of memory cells are described. An image processor as described herein is configured to reduce complexity and power consumption and/or increase data access bandwidth by performing image processing in the array of memory cells relative to image processing by a host processor external to the memory array. For instance, one apparatus described herein includes sensor circuitry configured to provide an input vector, as a plurality of bits that corresponds to a plurality of color components for an image pixel, and an image processor formed in an array of memory cells. The image processor is coupled to the sensor circuitry to receive the plurality of bits of the input vector. The image processor is configured to perform a color correction operation in the array by performing matrix multiplication on the input vector and a parameter matrix to determine an output vector that is color corrected.

Type: Grant

Filed: August 26, 2019

Date of Patent: January 19, 2021

Assignee: Micron Technology, Inc.

Inventors: Fa-Long Luo, Jaime C. Cummins, Tamara Schmitz
Programmable matrix processing engine

Patent number: 10896039

Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

Type: Grant

Filed: January 31, 2019

Date of Patent: January 19, 2021

Assignee: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
High performance QR decomposition systems and methods

Patent number: 10872130

Abstract: Based on a Modified Gram-Schmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmetic-logic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floating-point (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.

Type: Grant

Filed: August 31, 2017

Date of Patent: December 22, 2020

Assignee: Intel Corporation

Inventor: Martin Langhammer
Multi-level architecture of pattern recognition in biological data

Patent number: 10832799

Abstract: Methods, systems and apparatus for detecting patterns in constituents of at least one biological organism are disclosed. In accordance with one method, clusters of the constituents are determined (208) by selecting (210) different subsets of at least one of genes or proteins and identifying (212) the clusters from biological data corresponding to the selected subsets. Here, membership values for the constituents, indicating membership within the clusters, are calculated for use as a basis of an additional cluster determination process (208) to obtain final clusters of constituents. By underpinning the preliminary clustering on different subsets of biological data and formulating the higher-level clustering on the basis of the membership values, the embodiments can enable an evaluation of a large variety of biological data in a practical, accurate and highly efficient manner.

Type: Grant

Filed: August 12, 2016

Date of Patent: November 10, 2020

Assignee: Koninklijke Philips N.V.

Inventors: Konstantin Volyanskyy, Nevenka Dimitrova
Vector and matrix computing device

Patent number: 10762164

Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.

Type: Grant

Filed: July 19, 2018

Date of Patent: September 1, 2020

Assignee: Cambricon Technologies Corporation Limited

Inventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
Probabilistic matrix factorization for automated machine learning

Patent number: 10762163

Abstract: In embodiments of probabilistic matrix factorization for automated machine learning, a computing system memory maintains different workflows that each include preprocessing steps for a machine learning model, the machine learning model, and one or more parameters for the machine learning model. The computing system memory additionally maintains different data sets, upon which the different workflows can be trained and tested. A matrix is generated from the different workflows and different data sets, where cells of the matrix are populated with performance metrics that each indicate a measure of performance for a workflow applied to a data set. A low-rank decomposition of the matrix with populated performance metrics is then determined. Based on the low-rank decomposition, an optimum workflow for a new data set can be determined. The optimum workflow can be one of the different workflows or a hybrid of at least two of the different workflows.

Type: Grant

Filed: December 5, 2016

Date of Patent: September 1, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventor: Nicolo Fusi
Efficient scene depth map enhancement for low power devices

Patent number: 10755426

Abstract: An electronic device comprises circuitry implementing a depth map enhancer. The depth map enhancer obtains an initial depth map corresponding to a scene and an image of the scene. The depth map enhancer generates a refined depth map corresponding to the scene using an optimizer, the initial depth map and the image. The refined depth map includes estimated depth indicators corresponding to at least a first depth-information region, identified based at least in part on a first criterion, of the initial depth map. Input based on the refined depth map is provided to an image processing application.

Type: Grant

Filed: May 23, 2018

Date of Patent: August 25, 2020

Assignee: Apple Inc.

Inventors: Mark Norman Lester Jouppi, Michael Wish Tao, Eric Bujold, Stephane Simon Rene Ben Soussan, Volker Roelke, Geoffrey T. Anneheim, Julio Cesar Hernandez Zaragoza, Florian Ciurea
Efficient matrix property determination with pipelining and parallelism

Patent number: 10747846

Abstract: Matrix processing includes: initializing a current matrix based at least in part on an original matrix; iteratively determining a matrix property using a plurality of iteration cycles, including, in an iteration cycle: partitioning the current matrix to obtain a plurality of partitions, wherein the plurality of partitions includes a submatrix; modifying the submatrix based at least in part on other partitions of the plurality of partitions to provide a current matrix for a next iteration; and continuing to iterate until a condition is met. Matrix processing further includes obtaining the matrix property from an iteration result; and outputting the matrix property.

Type: Grant

Filed: September 25, 2019

Date of Patent: August 18, 2020

Assignee: Cyber Atomics, Inc.

Inventor: Roy Batruni
Video encoding method and device and decoding method and device

Patent number: 10743026

Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.

Type: Grant

Filed: September 5, 2019

Date of Patent: August 11, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Yoon-mi Hong, Woo-jin Han, Min-su Cheon, Jianle Chen
Vector processor configured to operate on variable length vectors using instructions that change element widths

Patent number: 10733140

Abstract: A computer processor is disclosed. The computer processor may comprises a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that produce results with elements of widths different than that of the input elements. The computer processor may be implemented as a monolithic integrated circuit.

Type: Grant

Filed: June 1, 2015

Date of Patent: August 4, 2020

Assignee: OPTIMUM SEMICONDUCTOR TECHNOLOGIES INC.

Inventors: Mayan Moudgill, Arthur Joseph Hoane, Paul Hurtley
Systems and methods for performing matrix compress and decompress instructions

Patent number: 10719323

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 21, 2020

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
Stream processing for LU decomposition

Patent number: 10664552

Abstract: An apparatus and method for LU decomposition of an input matrix, the input matrix comprising a multitude of elements forming a plurality of rows and columns. In an embodiment, the apparatus comprises a memory including a plurality of memory caches, and a processor unit operatively connected to the memory to transmit data to and receive data from the memory caches. The processing unit comprises a hardware circuit for processing the input matrix column-by-column to decompose the input matrix into a lower triangular matrix L and an upper triangular matrix U, including performing Gaussian eliminations on the columns of the matrix, with partial pivoting of the matrix, and choosing one of the elements of each of the columns as a pivot element for said each column while said each column is being processed.

Type: Grant

Filed: April 1, 2019

Date of Patent: May 26, 2020

Assignee: International Business Machines Corporation

Inventors: Maysam Mir Ahmadi, Sean Wagner
Methods and apparatus for sub-block based architecture of cholesky decomposition and channel whitening

Patent number: 10651951

Abstract: Methods and apparatus for sub-block based architecture of Cholesky decomposition and channel whitening. In an exemplary embodiment, an apparatus is provided that parallel processes sub-block matrices (R00, R10, and R11) of a covariance matrix (R) to determine a whitening coefficient matrix (W). The apparatus includes a first LDL coefficient calculator that calculates a first whitening matrix W00, lower triangle matrix L00, and diagonal matrix D00 from the sub-block matrix R00, a first matrix calculator that calculates a lower triangle matrix L10 from the sub-block matrix R10 and the matrices L00 and D00, and a second matrix calculator that calculates a matrix X from the matrices D00 and L10.

Type: Grant

Filed: December 29, 2018

Date of Patent: May 12, 2020

Assignee: Cavium, LLC.

Inventors: Yuanbin Guo, Hong Jik Kim
Scatter reduction instruction

Patent number: 10635447

Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.

Type: Grant

Filed: December 20, 2018

Date of Patent: April 28, 2020

Assignee: Intel Corporation

Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
Algebraic phasing of polyploids

Patent number: 10607718

Abstract: Embodiments of the present invention include method, systems and computer program products for algebraic phasing of polyploids. Aspects of the invention include receiving a matrix including a set of two or more single-nucleotide poloymorphisms (SNPs) for two or more sample organisms. Each row of the matrix is set to a ploidy based on a number of ploidies present in the two or more sample organisms. Each allele in the set of two or more SNPs is represented as a binary number. A set of algebraic rules is received, wherein the set of algebraic rules include an algebraic phasing algorithm. And the set of algebraic rules are applied to the matrix to determine a haplotype of a parent of the two or more sample organisms.

Type: Grant

Filed: May 14, 2019

Date of Patent: March 31, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORORATION

Inventors: Laxmi P. Parida, Filippo Utro
Relaxed execution of overlapping mixed-scalar-vector instructions

Patent number: 10599428

Abstract: Processing circuitry supports overlapped execution of vector instructions when at least one beat of a first vector instruction is performed in parallel with at least one beat of a second vector instruction. The processing circuitry also supports mixed-scalar-vector instructions for which one of a destination register and one or more source registers is a vector register and another is a scalar register. In a sequence including first and subsequent mixed-scalar-vector instructions, instances of relaxed execution which can potentially lead to uncertain and incorrect results are permitted by the processing circuitry when the instructions are separated by fewer than a predetermined number of intervening instructions. In practice the situations which lead to the uncertain results are very rare and so it is not justified providing relatively expensive dependency checking circuitry for eliminating such cases.

Type: Grant

Filed: March 23, 2016

Date of Patent: March 24, 2020

Assignee: ARM Limited

Inventor: Thomas Christopher Grocutt
Apparatus and methods for matrix multiplication

Patent number: 10592241

Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a master computation module configured to receive a first matrix and transmit a row vector of the first matrix. In addition, the aspects may include one or more slave computation modules respectively configured to store a column vector of a second matrix, receive the row vector of the first matrix, and multiply the row vector of the first matrix with the stored column vector of the second matrix to generate a result element. Further, the aspects may include an interconnection unit configured to combine the one or more result elements generated respectively by the one or more slave computation modules to generate a row vector of a result matrix and transmit the row vector of the result matrix to the master computation module.

Type: Grant

Filed: October 25, 2018

Date of Patent: March 17, 2020

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
System and method for routing bus including buffer

Patent number: 10579554

Abstract: Program procedures executed to rout a bus, via a processing unit, include a bus information extractor configured to extract bus information including physical requirements for the bus, from input data, a buffer array generator configured to generate a buffer array in which buffers included in the bus are regularly arranged based on the bus information, a buffer array placer configured to place at least one buffer array in the layout of the integrated circuit based on the bus information, and a wiring procedure configured to generate interconnections connected to buffers included in the at least one buffer array based on the bus information.

Type: Grant

Filed: June 8, 2017

Date of Patent: March 3, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventor: Byung-yong Kim
Computing method, information processing apparatus, computing program, and information processing system

Patent number: 10558730

Abstract: A computing method includes: generating first partitioned matrices by partitioning the first matrix by a least common multiple of the M and the N in the row direction and by the N in the column direction; generating second partitioned matrices by partitioning the second matrix by the M in the row direction and by the least common multiple in the column direction; adding a first product of the first partitioned matrices and the second partitioned matrices to a first result matrix; transmitting the first partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the row direction; transmitting the second partitioned matrices to computing elements directly connected to that computing element out of other computing elements connected to each other in a torus-like manner in the column direction.

Type: Grant

Filed: February 13, 2018

Date of Patent: February 11, 2020

Assignee: FUJITSU LIMITED

Inventor: Akihiko Kasagi
Method for matrix by vector multiplication for use in artificial neural network

Patent number: 10534839

Abstract: A method for matrix by vector multiplication, applied in an artificial neural network system, is disclosed. The method comprises: compressing a plurality of weight values in a weight matrix and indices of an input vector into a compressed main stream; storing M sets of synapse values in M memory devices; and, performing reading and MAC operations according to the M sets of synapse values and the compressed main stream to obtain a number M of output vectors. The step of compressing comprises: dividing the weight matrix into a plurality of N×L blocks; converting entries of a target block and corresponding indices of the input vector into a working block and an index matrix; removing zero entries in the working block; shifting non-zero entries row-by-row to one of their left and right sides in the working block; and, respectively shifting corresponding entries in the index matrix.

Type: Grant

Filed: June 25, 2018

Date of Patent: January 14, 2020

Assignee: BRITISH CAYMAN ISLANDS INTELLIGO TECHNOLOGY INC.

Inventors: Pei-Wen Hsieh, Chen-Chu Hsu, Tsung-Liang Chen
Bit matrix multiplication

Patent number: 10534838

Abstract: Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.

Type: Grant

Filed: September 29, 2017

Date of Patent: January 14, 2020

Assignee: Intel Corporation

Inventors: Dmitry Y. Babokin, Kshitij A. Doshi, Vadim Sukhomlinov
Data read-write scheduler and reservation station for vector operations

Patent number: 10521228

Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.

Type: Grant

Filed: November 7, 2018

Date of Patent: December 31, 2019

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
Memory-to-memory instructions to accelerate sparse-matrix by dense-vector and sparse-vector by dense-vector multiplication

Patent number: 10489063

Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.

Type: Grant

Filed: December 19, 2016

Date of Patent: November 26, 2019

Assignee: Intel Corporation

Inventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
Processor and task processing method therefor, and storage medium

Patent number: 10481957

Abstract: A processor and a task processing method therefor, and a storage medium. The method comprises: a scalar calculation module executing parameter calculation of a current task, and storing a parameter obtained through calculation in a PBUF; when the parameter calculation of the current task is completed, executing a first instruction or second instruction for inter-core synchronization, and storing the first instruction or the second instruction in the PBUF (301); a vector calculation module reading the parameter from the PBUF, storing the read parameter in a shadow register; when the first instruction or the second instruction is read from the PBUF, storing all the modified parameters in the shadow register in a work register within a period (302); and the vector calculation module executing vector calculation of the current task according to the parameter in the work register (303).

Type: Grant

Filed: July 1, 2016

Date of Patent: November 19, 2019

Assignee: Sanechips Technology Co., Ltd.

Inventors: Bo Wen, Qingxin Cao
Data compression apparatus and data compression method and storage medium

Patent number: 10482157

Abstract: A data compression apparatus includes a memory; and a processor configured to generate compressed matrix data, compare a threshold and an index value calculated about a specific value data string that is a data string obtained by coupling specific values specified from element values that are not zero values in each row of the compressed matrix data, specify a given constant as respective coefficients when the index value is larger than the threshold, calculate reciprocals of respective specific values as the respective coefficients when the index value is equal to or smaller than the threshold, and output matrix data after operation that is a result of rounding based on the number of places of significant figures of a decimal part in each element that corresponds about products of respective elements of the compressed matrix data and the respective coefficients calculated, regarding the respective elements of the compressed matrix data.

Type: Grant

Filed: February 27, 2019

Date of Patent: November 19, 2019

Assignee: FUJITSU LIMITED

Inventor: Makiko Konoshima
Video encoding method and device and decoding method and device

Patent number: 10455252

Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.

Type: Grant

Filed: July 2, 2018

Date of Patent: October 22, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
Method to compute sliding window block sum using instruction based selective horizontal addition in vector processor

Patent number: 10395381

Abstract: Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.

Type: Grant

Filed: March 4, 2019

Date of Patent: August 27, 2019

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Jayasree Sankaranarayanan, Dipan Kumar Mandal
Symmetric block sparse matrix-vector multiplication

Patent number: 10346507

Abstract: Embodiments of the present invention are directed to methods and systems for performing block sparse matrix-vector multiplications with improved efficiency through the use of a specific re-ordering the matrix data such that matrix symmetry can be exploited while simultaneously avoiding atomic memory operations or the need for inefficient memory operations in general. One disclosed method includes reordering the matrix data such that, for any column of non-transpose data, and for any row of transpose data simultaneously processed within a single thread-block on a GPU, all matrix elements update independent elements of the output vector. Using the method, the amount of data required to represent the sparse matrix can be reduced by as much as 50%, thereby doubling the effective performance on the GPU, and doubling the size of the matrix that can be accelerated by the GPU.

Type: Grant

Filed: October 26, 2017

Date of Patent: July 9, 2019

Assignee: Nvidia Corporation

Inventor: Steve Rennich
Matrix ordering for cache efficiency in performing large sparse matrix operations

Patent number: 10310812

Abstract: Mechanisms are provided for performing a matrix operation. A processor of a data processing system is configured to perform cluster-based matrix reordering of an input matrix. An input matrix, which comprises nodes associated with elements of the matrix, is received. The nodes are clustered into clusters based on numbers of connections with other nodes within and between the clusters, and the clusters are ordered by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix. A lookup table is generated identifying new locations of nodes of the input matrix, in the reordered matrix. A matrix operation is then performed based on the reordered matrix and the lookup table.

Type: Grant

Filed: February 6, 2017

Date of Patent: June 4, 2019

Assignee: International Business Machines Corporation

Inventors: Emrah Acar, Rajesh R. Bordawekar, Michele M. Franceschini, Luis A. Lastras-Montano, Ruchir Puri, Haifeng Qian, Livio B. Soares
Fast distributed nonnegative matrix factorization and completion for big data analytics

Patent number: 10304008

Abstract: Systems and methods are disclosed for operating a machine, by receiving training data from one or more sensors; training a machine learning module with the training data by: partitioning a data matrix into smaller submatrices to process in parallel and optimized for each processing node; for each submatrix, performing a greedy search for rank-one solutions; using alternating direction method of multipliers (ADMM) to ensure consistency over different data blocks; and controlling one or more actuators using live data and the learned module during operation.

Type: Grant

Filed: March 7, 2016

Date of Patent: May 28, 2019

Assignee: NEC Corporation

Inventors: Renqiang Min, Dongjin Song
Data processing device

Patent number: 10275392

Abstract: A data processing device includes a two-dimensional structure including a plurality of stages in a vertical direction, the stages each including basic units in a horizontal direction such that the number of the basic units is equal to the number of ways. The basic units each includes a memory block having a plurality of ports, an address generator for the ports of the memory block, and a calculation unit.

Type: Grant

Filed: April 6, 2016

Date of Patent: April 30, 2019

Assignee: NATIONAL UNIVERSITY CORPORATION NARA INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Yasuhiko Nakashima, Shinya Takamaeda
Direct register restore mechanism for distributed history buffers

Patent number: 10248426

Abstract: Techniques are disclosed for restoring register data in a processor. In one embodiment, a method includes receiving an instruction to flush one or more general purpose registers (GPRs) in a processor. The method also includes determining history buffer entries of a history buffer to be restored to the one or more GPRs. The method includes creating a mask vector that indicates which history buffer entries will be restored to the one or more GPRs. The method further includes restoring the indicated history buffer entries to the one or more GPRs. As each indicated history buffer entry is restored, the method includes updating the mask vector to indicate which history buffer entries have been restored.

Type: Grant

Filed: May 24, 2016

Date of Patent: April 2, 2019

Assignee: International Business Machines Corporation

Inventors: Brian D. Barrick, Steven J. Battle, Joshua W. Bowman, Christopher M. Mueller, Dung Q. Nguyen, David R. Terry, Eula Faye Tolentino, Jing Zhang
Scatter reduction instruction

Patent number: 10191749

Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.

Type: Grant

Filed: December 24, 2015

Date of Patent: January 29, 2019

Assignee: Intel Corporation

Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
Apparatuses, methods, and systems for element sorting of vectors

Patent number: 10191744

Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.

Type: Grant

Filed: July 1, 2016

Date of Patent: January 29, 2019

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Igor Ermolaev
Managing a prefetch queue based on priority indications of prefetch requests

Patent number: 10169239

Abstract: A prefetch request having a priority assigned thereto is obtained, based on executing a prefetch instruction included within a program. Based on obtaining the prefetch request, a determination is made as to whether the prefetch request may be placed on a prefetch queue. This determination includes determining whether the prefetch queue is full; checking, based on determining the prefetch queue is full, whether the priority of the prefetch request is considered a high priority; determining, based on the checking indicating the priority of the prefetch request is considered a high priority, whether another prefetch request on the prefetch queue may be removed; removing the other prefetch request from the prefetch queue, based on determining the other prefetch request may be removed; and adding the prefetch request to the prefetch queue, based on removing the other prefetch request.

Type: Grant

Filed: July 20, 2016

Date of Patent: January 1, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Dan F. Greiner, Michael K. Gschwind, Christian Jacobi, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
Data storage at contiguous memory addresses

Patent number: 10162752

Abstract: A method for storing data at contiguous memory addresses includes, at a single-instruction-multiple-data (SIMD) processor, executing a parallel-prefix valid count instruction to determine a first offset of a first data vector and to determine a second offset of a second data vector that includes valid data and invalid data. The second offset is based on the first offset and a number of positions in the first data vector that are associated with valid data. The method also includes storing first valid data from the first data vector at a first memory address of a memory and storing second valid data from the second data vector at a particular memory address of the memory. The first memory address is based on the first offset and the particular memory address is based on the second offset.

Type: Grant

Filed: September 22, 2016

Date of Patent: December 25, 2018

Assignee: QUALCOMM Incorporated

Inventors: Eric Mahurin, David Hoyle
Sparse data set processing

Patent number: 10146740

Abstract: A computer implemented method is provided for processing sparse data. A sparse data set is received. A modified sparse data set is calculated by replacing all nonzero values in the sparse data set with a common positive integer. The modified sparse data set is transposed to create a transposed data set. A covariance matrix is calculated by multiplying the transposed data set by the modified sparse data set. A tree of a predefined depth is generated by assigning columns of the sparse data set to right and left nodes based on co-occurrence with a first anchor column and a second anchor column. The first anchor column and the second anchor column are determined based on the covariance matrix.

Type: Grant

Filed: March 8, 2017

Date of Patent: December 4, 2018

Assignee: Symantec Corporation

Inventors: Nikolaos Vasiloglou, Andrew B. Gardner
Selection of the maximum dynamic range of transformed data and the data precision of transform matrices according to the bit depth of input data

Patent number: 10097834

Abstract: A method of encoding image data, including: frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision; and selecting the maximum dynamic range and/or the data precision of the transform matrices according to the bit depth of the input image data.

Type: Grant

Filed: April 4, 2014

Date of Patent: October 9, 2018

Assignee: Sony Corporation

Inventors: David Berry, James Alexander Gamei, Nicholas Ian Saunders, Karl James Sharman
System and method for using a mask register to track progress of gathering and scattering elements between data registers and memory

Patent number: 10042814

Abstract: A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.

Type: Grant

Filed: November 14, 2014

Date of Patent: August 7, 2018

Assignee: Intel Corporation

Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Andrew T. Forsyth, Michael Abrash
Video encoding method and device and decoding method and device

Patent number: 10038918

Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.

Type: Grant

Filed: September 11, 2017

Date of Patent: July 31, 2018

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Yoon-Mi Hong, Woo-Jin Han, Min-Su Cheon, Jianle Chen
System, method, and recording medium for mirroring matrices for batched cholesky decomposition on a graphic processing unit

Patent number: 9984041

Abstract: A batched Cholesky decomposition method, system, and non-transitory computer readable medium for a Graphics Processing Unit (GPU) including at least a first problem and a second problem, include mirroring a second problem matrix of the second problem to a first problem matrix of the first problem, combining the first problem matrix and the mirrored second problem matrix into a single problem matrix, and allocating data read to a thread and to the first problem and the second problem, respectively.

Type: Grant

Filed: June 30, 2016

Date of Patent: May 29, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Minsik Cho, David Shing-ki Kung, Ruchir Puri
Shared resource digital signal processors

Patent number: 9934195

Abstract: A multicore processor is achieved by a processor assembly, comprising a first processor having a first core and at least a first and a second unit, each being selected from the group of vector execution units, memory units and accelerators, said first core and first and second units being interconnected by a first network, and a second processor having a second core wherein the first core is arranged to enable the second core to control at least one of the units in the first processor. Each processors generally comprises a combination of execution units, memory units and accelerators, which may be controlled and/or accessed by units in the other processor.

Type: Grant

Filed: November 28, 2012

Date of Patent: April 3, 2018

Assignee: Mediatek Sweden AB

Inventors: Anders Nilsson, Eric Tell
Systems, apparatuses, and methods for performing vector packed compression and repeat

Patent number: 9870338

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed compression and repeat in response to a single vector packed compression and repeat instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.

Type: Grant

Filed: December 23, 2011

Date of Patent: January 16, 2018

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
Method and system for generating object code to facilitate predictive memory retrieval

Patent number: 9858079

Abstract: A method and system are described for generating reference tables in object code which specify the addresses of branches, routines called, and data references used by routines in the code. In a suitably equipped processing system, the reference tables can be passed to a memory management processor which can open the appropriate memory pages to expedite the retrieval of data referenced in the execution pipeline. The disclosed method and system create such reference tables at the beginning of each routine so that the table can be passed to the memory management processor in a suitably equipped processor. Resulting object code also allows processors lacking a suitable memory management processor to skip the reference table, preserving upward compatibility.

Type: Grant

Filed: October 19, 2015

Date of Patent: January 2, 2018

Assignee: Micron Technology, Inc.

Inventor: Dean A. Klein
Method and apparatus for asynchronous processor pipeline and bypass passing

Patent number: 9846581

Abstract: A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.

Type: Grant

Filed: September 8, 2014

Date of Patent: December 19, 2017

Assignee: Huawei Technologies Co., Ltd.

Inventors: Tao Huang, Yiqun Ge, Qifan Zhang, Wuxian Shi, Wen Tong

prev 1 2 3 4 5 6 … next