Systolic Patents (Class 708/522)
  • Patent number: 11500961
    Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: November 15, 2022
    Assignee: Google LLC
    Inventors: Andrew Everett Phelps, Norman Paul Jouppi
  • Patent number: 11314844
    Abstract: A singular value decomposition (SVD) is computed of a first matrix to define a left matrix, a diagonal matrix, and a right matrix. The left matrix, the diagonal matrix, and the right matrix are updated using an arrowhead matrix structure defined from the diagonal matrix and by adding a next observation vector to a last row of the first matrix. The updated left matrix, the updated diagonal matrix, and the updated right matrix are updated using a diagonal-plus-rank-one (DPR1) matrix structure defined from the updated diagonal matrix and by removing an observation vector from a first row of the first matrix. Eigenpairs of the DPR1 matrix are computed based on whether a value computed from the updated left matrix is positive or negative. The left matrix updated in (C), the diagonal matrix updated in (C), and the right matrix updated in (C) are output.
    Type: Grant
    Filed: October 19, 2021
    Date of Patent: April 26, 2022
    Assignee: SAS Institute Inc.
    Inventors: Hansi Jiang, Arin Chaudhuri
  • Patent number: 9448970
    Abstract: Computerized singular value decomposition of an input complex matrix. A real-value matrix representation of the input complex matrix is provided to a singular value decomposition module, which correctly obtains a singular value representation of the real-value matrix representation. However, the result is not provided in a form for convenient conversion back into a valid singular value decomposition solution for the original input complex matrix, as the upper left half and lower right half of the diagonal of the diagonal matrix are not identical. A correction module corrects by formulating a corrected diagonal matrix that represents the value of the diagonal of the first diagonal matrix, but shuffled so that the upper left half of the diagonal of the second diagonal matrix is the same as the lower right half of the diagonal of the second diagonal matrix. Corrected unitary matrices may also be formed.
    Type: Grant
    Filed: June 14, 2013
    Date of Patent: September 20, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Chun Sun, Sudarshan Raghunathan, Parry Jones Reginald Husbands, Tong Wen
  • Patent number: 9047240
    Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.
    Type: Grant
    Filed: January 28, 2013
    Date of Patent: June 2, 2015
    Assignee: XILINX, INC.
    Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
  • Patent number: 9047241
    Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.
    Type: Grant
    Filed: January 28, 2013
    Date of Patent: June 2, 2015
    Assignee: XILINX, INC.
    Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
  • Patent number: 8824603
    Abstract: A method and a system is provided for Coordinate Rotation Digital Computer (CORDIC) based matrix inversion of input digital signal streams from multiple antennas using an bi-directional ring-bus architecture. The bi-directional ring bus includes a first ring bus having signals flow in a clockwise direction, and a second ring bus having signals flow in a counter-clockwise direction. An I/O controller is coupled to the first and the second ring bus, respectively. A plurality of processing elements (PEs), where each of the plurality of PEs is coupled to the first and the second ring bus, respectively, wherein each of the plurality of PEs includes at least one CORDIC core for performing CORDIC iterations on the plurality of input digital stream signals to produce inversed matrix signals.
    Type: Grant
    Filed: March 1, 2013
    Date of Patent: September 2, 2014
    Assignee: Futurewei Technologies, Inc.
    Inventors: Yiqun Ge, Qifan Zhang, Peter Man Kin Sinn
  • Patent number: 8706787
    Abstract: Provided two CORDIC processors, each including: two input ports representing real and imaginary input ports; and two output ports representing real and imaginary output ports; wherein real and imaginary parts of a first input signal are applied to the imaginary input ports of the first and second CORDIC processors; real and imaginary parts of a second input signal are applied to the real input ports of the first and second CORDIC processors; the first and second CORDIC processors rotate the respective input signals applied thereto by 45 degrees in the clockwise direction; respective data from the real output ports of said first and second CORDIC processors constitute real and imaginary parts of a first output signal; and respective data from the imaginary output ports of said first and second CORDIC processors constitute real part and imaginary part of a second output signal.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: April 22, 2014
    Assignee: NEC Corporation
    Inventor: James Awuor Oduor Okello
  • Patent number: 8650238
    Abstract: In a digital system with more than one clock source, lack of synchronization between the clock sources may cause overflow or underflow in sample buffers, also called sample slipping. Sample slipping may lead to undesirable artifacts in the processed signal due to discontinuities introduced by the addition or removal of extra samples. To smooth out discontinuities caused by sample slipping, samples are filtered to when a buffer overflow condition occurs, and the samples are interpolated to produce additional samples when a buffer underflow condition occurs. The interpolated samples may also be filtered. The filtering and interpolation operations can be readily implemented without adding significant burden to the computational complexity of a real-time digital system.
    Type: Grant
    Filed: November 28, 2007
    Date of Patent: February 11, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Dinesh Ramakrishnan, Song Wang, Eddie L. T. Choy, Samir Kumar Gupta
  • Patent number: 8620984
    Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.
    Type: Grant
    Filed: November 23, 2009
    Date of Patent: December 31, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
  • Patent number: 8589465
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: May 8, 2013
    Date of Patent: November 19, 2013
    Assignee: Altera Corporation
    Inventors: Suleyman Sirri Demirsoy, Hyun Yi
  • Patent number: 8589467
    Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: November 19, 2013
    Assignee: NEC Corporation
    Inventor: Katsutoshi Seki
  • Patent number: 8510364
    Abstract: Methods for matrix processing and devices therefor are described. A systolic array in an integrated circuit is coupled to receive a first matrix as input; and is capable of operating in two modes, namely a triangularization mode and a back-substitution mode. The systolic array, when in a triangularization mode, is coupled to triangularize the first matrix to provide a second matrix. When in a back-substitution mode, the systolic array is coupled to invert the second matrix.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: August 13, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Patent number: 8473540
    Abstract: A decoder, such as for example an MMSE MIMO decoder, and a method for decoding are described. An input channel matrix is obtained, and an extended channel matrix of the input channel matrix is generated. The extended channel matrix is triangularized to provide a triangularized matrix, and the triangularized matrix is inverted to provide an inverted triangular matrix. A left matrix multiplication result matrix associated with multiplication of the input channel matrix and the inverted triangular matrix is generated, and a weight matrix from the left matrix multiplication result matrix and the inverted triangular matrix is generated. A received symbols matrix is obtained, and a weighted estimation is generated and output using the weight matrix and the received symbols matrix to provide an estimate of a transmit symbols matrix for output of estimated data symbols.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: June 25, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Patent number: 8473539
    Abstract: Nulling a cell of a complex matrix is described. A complex matrix and a modified Givens rotation matrix are obtained for multiplication by a processing unit, such as a systolic array or a CPU, for example, for the nulling of the cell to provide a modified form of the complex matrix. The modified Givens rotation matrix includes complex numbers c*, c, ?s, and s*, wherein the complex number s* is the complex conjugate of the complex number s, and wherein the complex number c* is the complex conjugate of the complex number c. The complex numbers c and s are associated with complex numbers of the complex matrix including the cell to be nulled. The modified form is then output by the processing unit. The modified Givens rotation matrix may be implemented as a systolic array or otherwise used for processing complex numbers or matrices.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: June 25, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Patent number: 8458243
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: June 4, 2013
    Assignee: Altera Corporation
    Inventors: Suleyman Sirri Demirsoy, Hyun Yi
  • Patent number: 8417758
    Abstract: A method, machine-readable medium, and systolic array for left matrix multiplication of a first matrix and a second matrix are described. The first matrix is a triangular matrix, and a cross-diagonal transpose of the first matrix is loaded into a triangular array of cells in an integrated circuit. A cross-diagonal transpose of the second matrix is input into the triangular array of cells for multiplication with the cross-diagonal transpose of the first matrix to produce an interim result. The interim result is cross-diagonally transposed to provide a left matrix multiplication result, which is stored or otherwise output.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: April 9, 2013
    Assignee: Xilinx, Inc.
    Inventors: Raghavendar M. Rao, Christopher H. Dick
  • Publication number: 20100250640
    Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.
    Type: Application
    Filed: November 21, 2008
    Publication date: September 30, 2010
    Inventor: Katsutoshi Seki
  • Publication number: 20090204658
    Abstract: A decimal calculation apparatus, which performs multidigit decimal calculation with the number of calculation digits set in a calculation instruction, includes a multidigit memory section which stores values with greater numbers of digits than the number of digits of a predetermined digit unit in a plurality of memory areas, a calculation-instruction memory section which stores the calculation instruction having the number of calculation digits and a type of calculation set therein, and a decimal calculation section which performs decimal calculation of sequentially calculating numerical values of corresponding digit units respectively stored in the plurality of memory areas of the multidigit memory section, digit unit by digit unit in the number of calculation digits set in the calculation instruction stored in calculation-instruction memory section, in decimal calculation according to type of calculation set in the calculation instruction stored in calculation-instruction memory section, and sequentially wr
    Type: Application
    Filed: April 22, 2009
    Publication date: August 13, 2009
    Applicant: Casio Computer Co., Ltd.
    Inventors: Hisashi ITO, Tetsuichi NAKAE
  • Patent number: 5948053
    Abstract: A digital signal processor has an arithmetic operation device that carries out arithmetic operations. The arithmetic operation device has a plurality of elementary arithmetic operation units. A signal path-forming device forms signal paths for inputting and outputting signals to and from the elementary arithmetic operations units, according to a predetermined program. The arithmetic operation device carries out processing of a digital signal input to the digital signal processor after the signal paths have been formed by the signal path-forming device.
    Type: Grant
    Filed: August 29, 1997
    Date of Patent: September 7, 1999
    Assignee: Yamaha Corporation
    Inventor: Ryo Kamiya