Matrix Array Patents (Class 708/520)
  • Patent number: 8812576
    Abstract: Circuitry for performing QR decomposition of an input matrix includes multiplication/addition circuitry for performing multiplication and addition/subtraction operations on a plurality of inputs, division/square-root circuitry for performing division and square-root operations on an output of the multiplication/addition circuitry, a first memory for storing the input matrix, a second memory for storing a selected vector of the input matrix, and a selector for inputting to the multiplication/addition circuitry any one or more of a vector of the input matrix, the selected vector, and an output of the division/square-root circuitry. On respective successive passes, a respective vector of the input matrix is read from a first memory into a second memory, and elements of a respective vector of an R matrix of the QR decomposition are computed and the respective vector of the input matrix in the first memory is replaced with the respective vector of the R matrix.
    Type: Grant
    Filed: September 12, 2011
    Date of Patent: August 19, 2014
    Assignee: Altera Corporation
    Inventor: Volker Mauer
  • Patent number: 8805912
    Abstract: When a Cholesky decomposition or a modified Cholesky decomposition is performed on a sparse symmetric positive definite matrix using a shared memory parallel computer, the discrete space in a problem presented by the linear simultaneous equations expressed by the sparse matrix is recursively sectioned into two sectioned areas and a sectional plane between the areas. The sectioning operation is stopped when the number of nodes configuring the sectional plane reaches the width of a super node. Each time the recursively halving process is performed, a number is sequentially assigned to the node in the sectioned area in order from a farther node from the sectional plane. The node in the sectional plane is numbered after assigning a number to the sectioned area each time the recursively halving process is performed.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: August 12, 2014
    Assignee: Fujitsu Limited
    Inventor: Makoto Nakanishi
  • Patent number: 8799345
    Abstract: A new approach for applying the multiple signal classification (MUSIC) method for high spectral resolution signal detection is described. The new approach uses a lower order covariance matrix, or, alternately, an autocorrelation matrix, to calculate only the number of eigenvalues and associated eigenvectors actually needed to solve for the number of signals sought.
    Type: Grant
    Filed: August 24, 2009
    Date of Patent: August 5, 2014
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventors: Lihyeh Liou, David M. Lin, James B. Tsui
  • Patent number: 8782115
    Abstract: A matrix decomposition circuit is described. In one implementation, the matrix decomposition circuit includes a processing element to process a plurality of processing cells and a scheduler coupled to the processing element, where the scheduler instructs the processing element to process only required processing cells of the plurality of processing cells. In one specific implementation, the required processing cells are processing cells with non-zero inputs.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: July 15, 2014
    Assignee: Altera Corporation
    Inventor: Kulwinder Dhanoa
  • Patent number: 8775496
    Abstract: Approaches for Cholesky decomposition of a matrix are described. A first circuit is configured to generate an inverse square root of an input value. A second circuit is configured to generate a product of a value output by the first circuit and provided at a first input and a value provided at a second input. A third circuit is configured to generate a difference between a value provided at the first input and a value provided at the second input of the third circuit. The first input of the third circuit is coupled to the output of the second circuit. A control circuit is configured to iteratively distribute a plurality of values of the matrix and the outputs of the first, second, and third circuits to the inputs of the first, second, and third circuits such that the Cholesky decomposition of the matrix is output by the third circuit.
    Type: Grant
    Filed: July 29, 2011
    Date of Patent: July 8, 2014
    Assignee: Xilinx, Inc.
    Inventors: Kaushik Barman, Raghavendar M. Rao
  • Patent number: 8775495
    Abstract: The present invention involves a sparse matrix processing system and method which uses sparse matrices that are compressed to reduce memory traffic and improve performance of computations using sparse matrices.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: July 8, 2014
    Assignee: Indiana University Research and Technology
    Inventors: Andrew Lumsdaine, Jeremiah Willcock
  • Publication number: 20140188969
    Abstract: An algorithm that maintains the symmetry of a symmetric bit matrix stored in computer memory without having to process all of the elements of a transpose column by considering only the elements changed in a row. The algorithm operates on groups of bits forming rows of the matrix rather than processing the individual bit elements of the matrix. Instead of checking whether each bit needs to be modified, the algorithm toggles only the column bits that are the transpose elements of modified row elements, thereby taking advantage of the existing symmetry to eliminate unnecessary conditional operations. As a result, the algorithm modifies the matrix on a row-by-row basis and makes changes to only those column bits that correspond to modified row elements without having to check the value of the transpose column elements that do not require modification.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Applicant: LSI CORPORATION
    Inventors: Deepti P. Chotai, Shankar T. More
  • Patent number: 8762443
    Abstract: Matrix operations circuitry for performing operations on submatrices of an input matrix includes a first working memory in which individual ones of the submatrices are operated on. The first working memory has a first submatrix size. The matrix operations circuitry also includes a second working memory in which a collection of the submatrices, that have been operated on in the first working memory, is operated on. The second working memory has an optimum burst size, and the first submatrix size is matched to the optimum burst size.
    Type: Grant
    Filed: November 15, 2011
    Date of Patent: June 24, 2014
    Assignee: Altera Corporation
    Inventor: Brian L. Kurtz
  • Publication number: 20140164466
    Abstract: A data transformation device defines a first square submatrix of an m order (m?2) including elements (n, n) in the matrixes A and F being detA?1 and A=GHnHn?1 . . . H1=GF, calculates a first element in the matrix Hi based on elements in a lowest order row of the first square submatrix, defines a second square submatrix of a (m+1) order, and calculates a second element in the matrix Hi based on elements in a lowest order row of the second square submatrix and the first element. The data transformation device calculates all elements in the matrix Hi by iterating processing on the second element until the second square submatrix becomes an n order and calculates elements of the matrix G using elements of matrix A and matrix H1. Then, variable transformation is performed to solve a linear system including n variables and n equations.
    Type: Application
    Filed: December 11, 2013
    Publication date: June 12, 2014
    Inventor: Isamu RYU
  • Publication number: 20140149480
    Abstract: A system, method, and computer program product are provided for transposing a matrix. In use, a matrix is identified. Additionally, the matrix is transposed utilizing row-wise operations and column-wise operations, where the row-wise operations and the column-wise operations are performed independently.
    Type: Application
    Filed: October 24, 2013
    Publication date: May 29, 2014
    Applicant: NVIDIA Corporation
    Inventors: Bryan Christopher Catanzaro, Manjunath Kudlur
  • Patent number: 8719323
    Abstract: A method for efficient state transition matrix based LFSR computations are disclosed. A polynomial associated with a linear feedback shift register is defined. This polynomial is used to generate a single step state transition matrix. The single step state transition matrix is then modified into a more general k-step state transition matrix. The resultant combined matrix is reduced in size and can be multiplied by a state input vector, ultimately producing a plurality of next state-input vectors thereby providing improved efficiency in computing a LFSR.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: May 6, 2014
    Assignee: LSI Corporation
    Inventor: Meng-Lin Yu
  • Patent number: 8700688
    Abstract: A data processing system 2 includes an instruction decoder 22 responsive to polynomial divide instructions DIVL.PN to generate control signals that control processing circuitry 26 to perform a polynomial division operation. The denominator polynomial is represented by a denominator value stored within a register with an assumption that the highest degree term of the polynomial always has a coefficient of “1” such that this coefficient need not be stored within the register storing the denominator value and accordingly the denominator polynomial may have a degree one higher than would be possible with the bit space within the register storing the denominator value alone. The polynomial divide instruction returns a quotient value and a remainder value respectively representing the quotient polynomial and the remainder polynomial.
    Type: Grant
    Filed: February 23, 2009
    Date of Patent: April 15, 2014
    Assignee: U-Blox AG
    Inventors: Dominic H Symes, Daniel Kershaw, Martinus C Wezelenburg
  • Publication number: 20140095569
    Abstract: An orthogonal code matrix generation method includes: establishing an N×N orthogonal code matrix, wherein an inner product of every two rows of the orthogonal code matrix is 0, and each column of the orthogonal code matrix has a summation of elements equal to a same value, wherein N is a power of 4; and using the N×N orthogonal code matrix as a basic unit to establish a target orthogonal code matrix. An orthogonal code matrix generation circuit includes: an N×N orthogonal code matrix generator, arranged for establishing an N×N orthogonal code matrix, wherein an inner product of every two rows of the orthogonal code matrix is 0, each column of the orthogonal code matrix has a summation of elements equal to a same value; and a target orthogonal code matrix generator, arranged for using the N×N orthogonal code matrix as a basic unit to establish a target orthogonal code matrix.
    Type: Application
    Filed: January 3, 2013
    Publication date: April 3, 2014
    Applicant: Raydium Semiconductor Corporation
    Inventors: Shih-Lun Huang, Kai-Ming Liu
  • Patent number: 8687008
    Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 1, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
  • Patent number: 8676874
    Abstract: A computer system retrieves a slice of sparse matrix data, which includes multiple rows that each includes multiple elements. The computer system identifies one or more non-zero values stored in one or more of the rows. Each identified non-zero value corresponds to a different row, and also corresponds to an element location within the corresponding row. In turn, the computer system stores each of the identified non-zero values and corresponding element locations within a packet at predefined fields corresponding to the different rows.
    Type: Grant
    Filed: December 6, 2010
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventor: Gordon Clyde Fossum
  • Patent number: 8645440
    Abstract: A method for multidimensional scaling (MDS) of a data set comprising a plurality of data elements is provided, wherein each data element is identified by its coordinates, the method comprising the steps of: (i) applying an iterative optimization technique, such as SMACOF, a predetermined amount of times on a coordinates vector, said coordinates vector representing the coordinates of a plurality of said data elements, and obtaining a modified coordinates vector; (ii) applying a vector extrapolation technique, such as Minimal Polynomial Extrapolation (MPE) or reduced Rank Extrapolation (RRE) on said modified coordinates vector obtaining a further modified coordinates vector; and (iii) repeating steps (i) and (ii) until one or more predefined conditions are met.
    Type: Grant
    Filed: June 10, 2008
    Date of Patent: February 4, 2014
    Inventors: Guy Rosman, Alexander Bronstein, Michael Bronstein, Ron Kimmel
  • Patent number: 8626815
    Abstract: In a matrix multiplication in which each element of the resultant matrix is the dot product of a row of a first matrix and a column of a second matrix, each row and column can be broken into manageable blocks, with each block loaded in turn to compute a smaller dot product, and then the results can be added together to obtain the desired row-column dot product. The earliest results for each dot product are saved for a number of clock cycles equal to the number of portions into which each row or column is divided. The results are then added to provide an element of the resultant matrix. To avoid repeated loading and unloading of the same data, all multiplications involving a particular row-block can be performed upon loading that row-block, with the results cached until other multiplications for the resultant elements that use the cached results are complete.
    Type: Grant
    Filed: March 3, 2009
    Date of Patent: January 7, 2014
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8620984
    Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.
    Type: Grant
    Filed: November 23, 2009
    Date of Patent: December 31, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
  • Patent number: 8620976
    Abstract: A machine-implemented method for computerized digital signal processing including obtaining a digital signal from data storage or from conversion of an analog signal, and determining, from the digital signal, one or more measuring matrices. Each measuring matrix has a plurality of cells, and each cell has an amplitude corresponding to the signal energy in a frequency bin for a time slice. Cells in each measuring matrix having maximum amplitudes along a time slice and/or frequency bin are identified as maximum cells. Maxima that coincide in time and frequency are identified and a correlated maxima matrix, called a “Precision Measuring Matrix” is constructed showing the coinciding maxima and the adjacent marked maxima are linked into partial chains.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: December 31, 2013
    Assignee: Paul Reed Smith Guitars Limited Partnership
    Inventors: Paul Reed Smith, Frederick M. Slay, Ernestine M. Smith
  • Patent number: 8612507
    Abstract: A computing device includes: a deciding unit which, in computation of values of nodes on a lattice in a direction where a value of m representing a horizontal axis coordinate of the lattice increases, decides dummy nodes to be added to m=n?1, so as to enable values of nodes on m=n to be calculated by adding the dummy nodes to m=n?1 and executing a vector operation through the use of the SIMD function by using values of nodes on m=n?1 and values of the added dummy nodes; an adding unit adding the dummy nodes decided by the deciding unit to m=n?1; and a calculating unit calculating the values of the nodes present on m=n by executing the vector operation through the use of the SIMD function by using the values of the nodes on m=n?1 and the values of the dummy nodes added by the adding unit.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: December 17, 2013
    Assignee: NS Solutions Corporation
    Inventor: Hiroki Takeshita
  • Patent number: 8589468
    Abstract: The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: November 19, 2013
    Assignee: NVIDIA Corporation
    Inventors: Norbert Juffa, Radoslav Danilak
  • Patent number: 8583719
    Abstract: An arithmetic operation apparatus includes: a branch node set detection unit to detect a set of branch nodes for each parallel level; a subtree memory storage area allocation unit to allocate an arithmetic result of a column vector to a memory storage area selected on a basis of a predetermined selection rule from a plurality of memory storage areas; and a node memory storage area allocation unit to allocate an arithmetic result of a column vector to a memory storage area selected on a basis of a predetermined selecting rule from a plurality of memory storage areas.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: November 12, 2013
    Assignee: Fujitsu Limited
    Inventor: Makoto Nakanishi
  • Patent number: 8577949
    Abstract: A system for a conjugate gradient iterative linear solver that calculates the solution to a matrix equation comprises a plurality of gamma processing elements, a plurality of direction vector processing elements, a plurality of x-vector processing elements, an alpha processing element, and a beta processing element. The gamma processing elements may receive an A-matrix and a direction vector, and may calculate a q-vector and a gamma scalar. The direction vector processing elements may receive a beta scalar and a residual vector, and may calculate the direction vector. The x-vector processing elements may receive an alpha scalar, the direction vector, and the q-vector, and may calculate an x-vector and the residual vector. The alpha processing element may receive the gamma scalar and a delta scalar, and may calculate the alpha scalar. The beta processing element may receive the residual vector, and may calculate the delta scalar and the beta scalar.
    Type: Grant
    Filed: July 7, 2009
    Date of Patent: November 5, 2013
    Assignee: L-3 Communications Integrated Systems, L.P.
    Inventors: Matthew P. DeLaquil, Deepak Prasanna, Antone L. Kusmanoff
  • Patent number: 8554820
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: October 8, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Publication number: 20130262548
    Abstract: A matrix calculation unit may include a matrix operation unit and a converting unit. The matrix operation unit may include functions to perform a matrix operation of a first size with respect to data stored in a memory, and to perform a matrix operation of a second size with respect to the data stored in the memory, where the second size is enlarged from the first size. The converting unit may convert in at least one direction in the memory between a data array suited for the matrix operation of the first size and a data array suited for the matrix operation of the second size.
    Type: Application
    Filed: February 27, 2013
    Publication date: October 3, 2013
    Applicants: FUJITSU SEMICONDUCTOR LIMITED, FUJITSU LIMITED
    Inventors: Yi GE, Hiroshi HATANO, Kazuo HORIO
  • Patent number: 8543633
    Abstract: A modified Gram-Schmidt QR decomposition core implemented in a single field programmable gate array (FPGA) comprises a converter configured to convert a complex fixed point input to a complex floating point input, dual port memory to hold complex entries of an input matrix, normalizer programmable logic module (PLM) to compute a normalization of a column vector. A second PLM performs complex, floating point multiplication on two input matrix columns. A scheduler diverts control of the QRD processing to the normalizer PLM or the second PLM. A top level state machine communicates with scheduler and monitors processing in normalizer PLM and second PLM and communicates the completion of operations to scheduler. A complex divider computes final column for output matrix Q using floating point arithmetic. Multiplexer outputs computed values as elements of output matrix Q or R. Complex floating point operations are performed in a parallel pipelined implementation reducing latencies.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: September 24, 2013
    Assignee: Lockheed Martin Corporation
    Inventor: Luke A. Miller
  • Patent number: 8539016
    Abstract: Circuitry speeds up the QR decomposition of a matrix. The circuitry can be provided in a fixed logic device, or can be configured into a programmable integrated circuit device such as a programmable logic device. This implementation performs Gram-Schmidt orthogonalization with no dependencies between iterations. QR decomposition of a matrix can be performed by processing entire columns at once as a vector operation. Data dependencies within and between matrix columns are removed, as later functions dependent on an earlier result may be generated from partial results somewhere in the datapath, rather than from an earlier completed result. Different passes through the matrix are timed so that different computations requiring the same functional units arrive at different time slots. After the Q matrix has been calculated, the R matrix may be calculated from the Q matrix by taking its transpose and multiplying the transpose by the original input matrix.
    Type: Grant
    Filed: February 9, 2010
    Date of Patent: September 17, 2013
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8533251
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Patent number: 8527571
    Abstract: A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Fred Gehrung Gustavson, John A. Gunnels
  • Patent number: 8521799
    Abstract: Disclosed are a row-vector norm comparison method and a row-vector norm comparison apparatus for an inverse matrix. A row-vector norm comparison apparatus includes: an input matrix processing module that receives and combines constituent elements of a matrix; a cofactor operation module that multiplexes the combination result of the constituent elements to calculate factors constituting an adjoint matrix; a square calculation module that squares the calculated factors; a summation module that selects a predetermined number of factors among the squared factors and sums the selected factors to calculate the norms of row vectors in an inverse matrix; and a norm comparison module that outputs a comparison result of the calculated norms of the row vectors.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 27, 2013
    Assignees: Samsung Electronics Co., Ltd., Electronics and Telecommunications Research Institute
    Inventors: Young Ha Lee, Seung Jae Bahng, Youn-Ok Park
  • Patent number: 8510364
    Abstract: Methods for matrix processing and devices therefor are described. A systolic array in an integrated circuit is coupled to receive a first matrix as input; and is capable of operating in two modes, namely a triangularization mode and a back-substitution mode. The systolic array, when in a triangularization mode, is coupled to triangularize the first matrix to provide a second matrix. When in a back-substitution mode, the systolic array is coupled to invert the second matrix.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: August 13, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Patent number: 8499022
    Abstract: Combining multiple clusterings arises in various important data mining scenarios. However, finding a consensus clustering from multiple clusterings is a challenging task because there is no explicit correspondence between the classes from different clusterings. Provided is a framework based on soft correspondence to directly address the correspondence problem in combining multiple clusterings. Under this framework, an algorithm iteratively computes the consensus clustering and correspondence matrices using multiplicative updating rules. This algorithm provides a final consensus clustering as well as correspondence matrices that gives intuitive interpretation of the relations between the consensus clustering and each clustering from clustering ensembles. Extensive experimental evaluations demonstrate the effectiveness and potential of this framework as well as the algorithm for discovering a consensus clustering from multiple clusterings.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: July 30, 2013
    Assignee: The Research Foundation of State University of New York
    Inventors: Bo Long, Zhongfei Mark Zhang
  • Patent number: 8498949
    Abstract: Supervised nonnegative matrix factorization (SNMF) generates a descriptive part-based representation of data, based on the concept of nonnegative matrix factorization (NMF) aided by the discriminative concept of graph embedding. An iterative procedure that optimizes suggested formulation based on Pareto optimization is presented. The present formulation removes any dependence on combined optimization schemes. Analytical and empirical evidence is presented to show that SNMF has advantages over popular subspace learning techniques as well as current state-of-the-art techniques.
    Type: Grant
    Filed: August 11, 2010
    Date of Patent: July 30, 2013
    Assignee: Seiko Epson Corporation
    Inventors: Seung-il Huh, Mithun Das Gupta, Jing Xiao
  • Patent number: 8484275
    Abstract: There is provided a method for generating a table for reordering the output of a Fourier transform, the Fourier transform being performed on a predefined number of input samples, the method comprising performing one or more decomposition stages on a sequence corresponding in number to the predefined number of input samples to form a representation of the output of the Fourier transform; wherein at least one of the decomposition stages comprises a composite operation that is equivalent to two or more operations; and rearranging the representation of the output of the Fourier transform to generate a reordering table.
    Type: Grant
    Filed: December 7, 2007
    Date of Patent: July 9, 2013
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Neil Kenneth Thorne
  • Patent number: 8473540
    Abstract: A decoder, such as for example an MMSE MIMO decoder, and a method for decoding are described. An input channel matrix is obtained, and an extended channel matrix of the input channel matrix is generated. The extended channel matrix is triangularized to provide a triangularized matrix, and the triangularized matrix is inverted to provide an inverted triangular matrix. A left matrix multiplication result matrix associated with multiplication of the input channel matrix and the inverted triangular matrix is generated, and a weight matrix from the left matrix multiplication result matrix and the inverted triangular matrix is generated. A received symbols matrix is obtained, and a weighted estimation is generated and output using the weight matrix and the received symbols matrix to provide an estimate of a transmit symbols matrix for output of estimated data symbols.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: June 25, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Patent number: 8473539
    Abstract: Nulling a cell of a complex matrix is described. A complex matrix and a modified Givens rotation matrix are obtained for multiplication by a processing unit, such as a systolic array or a CPU, for example, for the nulling of the cell to provide a modified form of the complex matrix. The modified Givens rotation matrix includes complex numbers c*, c, ?s, and s*, wherein the complex number s* is the complex conjugate of the complex number s, and wherein the complex number c* is the complex conjugate of the complex number c. The complex numbers c and s are associated with complex numbers of the complex matrix including the cell to be nulled. The modified form is then output by the processing unit. The modified Givens rotation matrix may be implemented as a systolic array or otherwise used for processing complex numbers or matrices.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: June 25, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Publication number: 20130159372
    Abstract: Embodiments relate to dynamic programming. An aspect includes representing a dynamic programming problem as a matrix of cells, each cell representing an intermediate score to be calculated. Another aspect includes providing a mapping assigning cells of the matrix to elements of a result container data structure, and storing cells of the matrix to elements of the result container data structure in accordance with the mapping. Another aspect includes calculating intermediate scores of all cells of the matrix, whereby intermediate scores of some of the cells of the matrix are stored to a respectively assigned element of the result container data structure in accordance with the mapping. Another aspect includes during the calculation of the intermediate scores, dynamically updating the assignment of cells and elements in the mapping and assembling a final result of the dynamic programming problem from the intermediate scores stored in the result container data structure.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 20, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130159373
    Abstract: A sparse matrix used in the least-squares method is divided into small matrices in accordance with the number of elements of observation. An observation ID is assigned to each element of observation, a parameter ID is assigned to each parameter, and the IDs are associated with parameters of elements as ID mapping. A system determines positions of nonzero elements in accordance with whether or not ID mapping exists, the correspondence between observation IDs and parameter IDs, and the positions of the small matrices, and selects a storage scheme for each small matrix based thereon. The system selects a storage scheme in accordance with conditions, such as whether or not a target element is a diagonal element, whether or not a term decided without ID mapping exists, and whether or not the same ID mapping is referred to.
    Type: Application
    Filed: December 18, 2012
    Publication date: June 20, 2013
    Applicant: International Business Machines Corporation
    Inventor: International Business Machines Corporation
  • Patent number: 8463838
    Abstract: A windowed optical calculation architecture and process that efficiently performs high speed multi-element multiply and accumulates on a digital data stream. A data point from a digital data stream is impressed onto an optical source to create an optical value. The optical value is split into a number of branches equaling the number of elements used in the calculation. In each branch, the optical value is modulated to reflect the coefficients in the calculation. Then, depending upon the branch, the optical value is delayed depending on its position in the calculation, with optical values at the beginning of the calculation being delayed longer than optical values at the end of the calculation. The outputs from the branches are coupled together to perform an optical sum, and passed to detection/analog-digital conversion circuitry to convert the optical result to a digital result.
    Type: Grant
    Filed: October 28, 2009
    Date of Patent: June 11, 2013
    Assignee: Lockheed Martin Corporation
    Inventor: Brian L Ulhorn
  • Publication number: 20130138712
    Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.
    Type: Application
    Filed: January 28, 2013
    Publication date: May 30, 2013
    Applicant: XILINX, INC.
    Inventor: XILINX, INC.
  • Publication number: 20130124593
    Abstract: The quantifying method for intrinsic data transfer rate of algorithms is provided. The provided quantifying method for an intrinsic data transfer rate includes steps of: detecting whether or not a datum is used; providing a dataflow graph G including n vertices and m edges, and a Laplacian matrix L having ixj elements L(i,j) when the datum is not reused, wherein each of the vertices represents one of an operation and a datum, each of the edges represents a data transfer, and vi is the ith vertex; and using the Laplacian matrix L to estimate a maximum quantity of the intrinsic data transfer rate.
    Type: Application
    Filed: July 20, 2011
    Publication date: May 16, 2013
    Applicant: NATIONAL CHENG KUNG UNIVERSITY
    Inventors: Gwo Giun Lee, He-Yuan Lin
  • Patent number: 8443031
    Abstract: A systolic array for Cholesky decomposition of an N×N matrix is described. A plurality of processing cells, including a corner cell, N?1 boundary cells, and (N2?3N+2)/2 internal cells, are arranged into N?1 rows and N columns of processing cells. Each row of processing cells is configured to calculate elements of a respective column of a lower triangular output matrix. Each processing cell of each row is configured to determine a value of a respective element of the lower triangular output matrix using a value of an element calculated in a previous processing cell of the row.
    Type: Grant
    Filed: July 19, 2010
    Date of Patent: May 14, 2013
    Assignee: Xilinx, Inc.
    Inventor: Raghavendar M. Rao
  • Patent number: 8443032
    Abstract: A multiplication circuit generates a product of a matrix and a first scalar when in matrix mode and a product of a second scalar and a third scalar when in scalar mode. The multiplication circuit comprises a sub-product generator, an accumulator and an adder. The adder is configured to sum outputs of the accumulator to generate the product of the first scalar second scalar and the third scalar when in scalar mode. The sub-product generator generates sub-products of the matrix and the first scalar when in matrix mode and sub-products of the second scalar and the third scalar when in scalar mode. The accumulator is configured to generate the product of the matrix and the first scalar by providing save of the multiplication operation of the outputs from the sub-product generator.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: May 14, 2013
    Assignee: National Tsing Hua University
    Inventors: Chen Hsing Wang, Chieh Lin Chuang, Cheng Wen Wu
  • Patent number: 8433741
    Abstract: A system for signature prediction and feature-level fusion of a target according to various aspects of the present invention includes a first sensing modality for providing a measured data set. The system further includes a processor receiving the measured data set and generating a first k-orthogonal spanning tree constructed from k orthogonal minimal spanning trees having no edge shared between the k minimal spanning trees to define a first data manifold. A method for signature prediction and feature-level fusion of a target according to various aspects of the present invention includes generating a first manifold by developing a connected graph of data from a first sensing modality using a first k-orthogonal spanning tree, generating a second manifold by developing a second connected graph of data from a second sensing modality using a second k-orthogonal spanning tree, and aligning the first manifold and the second manifold to generate a joint-signature manifold in a common embedding space.
    Type: Grant
    Filed: June 5, 2008
    Date of Patent: April 30, 2013
    Assignee: Raytheon Company
    Inventors: Donald E. Waagen, Samantha S. Livingston, Nitesh N. Shah
  • Patent number: 8396914
    Abstract: Circuitry speeds up the Cholesky decomposition of a matrix. The circuitry can be provided in a fixed logic device, or can be configured into a programmable integrated circuit device such as a programmable logic device. The circuitry implements the following equation: l ij = a ij - ? L i , L j ? a jj - ? L j , L j ? When any lij term is calculated this way, the latency in calculating the ljj term in the denominator has little or no effect on the lij term calculation. And if the calculations are properly pipelined, once the pipeline is filled, a new term can be output on each clock cycle or every few clock cycles.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 12, 2013
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8392487
    Abstract: A matrix processor and processing method, the processor including a data encoder for receiving an input data stream; a data controller coupled to the data encoder for arranging the input data in an operand matrix, at least one processing unit for processing the data in matrix form by Boolean matrix-matrix multiplication with a selected operator matrix, and an output control module coupled to the processing unit for outputting desired results therefrom.
    Type: Grant
    Filed: March 31, 2008
    Date of Patent: March 5, 2013
    Assignee: Compass Electro-Optical Systems Ltd
    Inventors: Michael Mesh, Michael Laor, Alexander Zeltser
  • Patent number: 8392692
    Abstract: In one embodiment, the present invention determines index values corresponding to bits of a binary vector that have a value of 1. During each clock cycle, a masking technique is applied to M sub-vector index values, where each sub-vector index value corresponds to a different bit of a sub-vector of the binary vector. The masking technique is applied such that (i) the sub-vector index values that correspond to bits having a value of 0 are zeroed out and (ii) the sub-vector index values that correspond to the bits having a value of 1 are left unchanged. The masked sub-vector index values are sorted, and index values are calculated based on the masked sub-vector index values. The index values generated are then distributed uniformly to a number M of index memories such that the M index memories store substantially the same number of index values.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: March 5, 2013
    Assignee: LSI Corporation
    Inventor: Kiran Gunnam
  • Patent number: 8380778
    Abstract: A system, method, and computer program product are provided for assigning elements of a matrix to processing threads. In use, a matrix is received to be processed by a parallel processing architecture. Such parallel processing architecture includes a plurality of processors each capable of processing a plurality of threads. Elements of the matrix are assigned to each of the threads for processing, utilizing an algorithm that increases a contiguousness of the elements being processed by each thread.
    Type: Grant
    Filed: October 25, 2007
    Date of Patent: February 19, 2013
    Assignee: NVIDIA Corporation
    Inventors: William N. Bell, Michael J. Garland
  • Patent number: 8364739
    Abstract: Techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrix-vector multiplication, analyzing the sparse matrix-vector multiplication to identify one or more optimizations, wherein analyzing the sparse matrix-vector multiplication to identify one or more optimizations comprises analyzing a non-zero pattern for one or more optimizations and determining whether the sparse matrix-vector multiplication is to be reused across computation, optimizing the sparse matrix-vector multiplication, wherein optimizing the sparse matrix-vector multiplication comprises optimizing global memory access, optimizing shared memory access and exploiting reuse and parallelism, and outputting an optimized sparse matrix-vector multiplication.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: January 29, 2013
    Assignee: International Business Machines Corporation
    Inventors: Muthu M. Baskaran, Rajesh J. Bordawekar
  • Patent number: 8341205
    Abstract: The present invention uses a computer analysis system of a fast singular value decomposition to overcome the bottleneck of a traditional singular value decomposition that takes much computing time for decomposing a huge number of objects, and the invention can also process a matrix in any form without being limited to symmetric matrixes only. The decomposition and subgroup concept of the fast singular value decomposition works together with the decomposition of a variance matrix and the adjustment of an average vector of a column vector are used for optimizing the singular value decomposition to improve the overall computing speed of the computer analysis system.
    Type: Grant
    Filed: July 2, 2008
    Date of Patent: December 25, 2012
    Assignee: Everspeed Technology Limited
    Inventor: Jengnan Tzeng