Systolic Patents (Class 708/522)
-
Patent number: 12197888Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.Type: GrantFiled: December 23, 2019Date of Patent: January 14, 2025Assignee: Altera CorporationInventor: Martin Langhammer
-
Patent number: 12072952Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.Type: GrantFiled: March 26, 2021Date of Patent: August 27, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Swapnil P. Sakharshete, Pramod Vasant Argade, Maxim V. Kazakov, Alexander M. Potapov
-
Patent number: 11989259Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: November 10, 2022Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 11500961Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.Type: GrantFiled: March 26, 2020Date of Patent: November 15, 2022Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 11314844Abstract: A singular value decomposition (SVD) is computed of a first matrix to define a left matrix, a diagonal matrix, and a right matrix. The left matrix, the diagonal matrix, and the right matrix are updated using an arrowhead matrix structure defined from the diagonal matrix and by adding a next observation vector to a last row of the first matrix. The updated left matrix, the updated diagonal matrix, and the updated right matrix are updated using a diagonal-plus-rank-one (DPR1) matrix structure defined from the updated diagonal matrix and by removing an observation vector from a first row of the first matrix. Eigenpairs of the DPR1 matrix are computed based on whether a value computed from the updated left matrix is positive or negative. The left matrix updated in (C), the diagonal matrix updated in (C), and the right matrix updated in (C) are output.Type: GrantFiled: October 19, 2021Date of Patent: April 26, 2022Assignee: SAS Institute Inc.Inventors: Hansi Jiang, Arin Chaudhuri
-
Patent number: 9448970Abstract: Computerized singular value decomposition of an input complex matrix. A real-value matrix representation of the input complex matrix is provided to a singular value decomposition module, which correctly obtains a singular value representation of the real-value matrix representation. However, the result is not provided in a form for convenient conversion back into a valid singular value decomposition solution for the original input complex matrix, as the upper left half and lower right half of the diagonal of the diagonal matrix are not identical. A correction module corrects by formulating a corrected diagonal matrix that represents the value of the diagonal of the first diagonal matrix, but shuffled so that the upper left half of the diagonal of the second diagonal matrix is the same as the lower right half of the diagonal of the second diagonal matrix. Corrected unitary matrices may also be formed.Type: GrantFiled: June 14, 2013Date of Patent: September 20, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Chun Sun, Sudarshan Raghunathan, Parry Jones Reginald Husbands, Tong Wen
-
Patent number: 9047240Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.Type: GrantFiled: January 28, 2013Date of Patent: June 2, 2015Assignee: XILINX, INC.Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
-
Patent number: 9047241Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.Type: GrantFiled: January 28, 2013Date of Patent: June 2, 2015Assignee: XILINX, INC.Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
-
Patent number: 8824603Abstract: A method and a system is provided for Coordinate Rotation Digital Computer (CORDIC) based matrix inversion of input digital signal streams from multiple antennas using an bi-directional ring-bus architecture. The bi-directional ring bus includes a first ring bus having signals flow in a clockwise direction, and a second ring bus having signals flow in a counter-clockwise direction. An I/O controller is coupled to the first and the second ring bus, respectively. A plurality of processing elements (PEs), where each of the plurality of PEs is coupled to the first and the second ring bus, respectively, wherein each of the plurality of PEs includes at least one CORDIC core for performing CORDIC iterations on the plurality of input digital stream signals to produce inversed matrix signals.Type: GrantFiled: March 1, 2013Date of Patent: September 2, 2014Assignee: Futurewei Technologies, Inc.Inventors: Yiqun Ge, Qifan Zhang, Peter Man Kin Sinn
-
Patent number: 8706787Abstract: Provided two CORDIC processors, each including: two input ports representing real and imaginary input ports; and two output ports representing real and imaginary output ports; wherein real and imaginary parts of a first input signal are applied to the imaginary input ports of the first and second CORDIC processors; real and imaginary parts of a second input signal are applied to the real input ports of the first and second CORDIC processors; the first and second CORDIC processors rotate the respective input signals applied thereto by 45 degrees in the clockwise direction; respective data from the real output ports of said first and second CORDIC processors constitute real and imaginary parts of a first output signal; and respective data from the imaginary output ports of said first and second CORDIC processors constitute real part and imaginary part of a second output signal.Type: GrantFiled: September 26, 2007Date of Patent: April 22, 2014Assignee: NEC CorporationInventor: James Awuor Oduor Okello
-
Patent number: 8650238Abstract: In a digital system with more than one clock source, lack of synchronization between the clock sources may cause overflow or underflow in sample buffers, also called sample slipping. Sample slipping may lead to undesirable artifacts in the processed signal due to discontinuities introduced by the addition or removal of extra samples. To smooth out discontinuities caused by sample slipping, samples are filtered to when a buffer overflow condition occurs, and the samples are interpolated to produce additional samples when a buffer underflow condition occurs. The interpolated samples may also be filtered. The filtering and interpolation operations can be readily implemented without adding significant burden to the computational complexity of a real-time digital system.Type: GrantFiled: November 28, 2007Date of Patent: February 11, 2014Assignee: QUALCOMM IncorporatedInventors: Dinesh Ramakrishnan, Song Wang, Eddie L. T. Choy, Samir Kumar Gupta
-
Patent number: 8620984Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.Type: GrantFiled: November 23, 2009Date of Patent: December 31, 2013Assignee: Xilinx, Inc.Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
-
Patent number: 8589467Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.Type: GrantFiled: November 21, 2008Date of Patent: November 19, 2013Assignee: NEC CorporationInventor: Katsutoshi Seki
-
Patent number: 8589465Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.Type: GrantFiled: May 8, 2013Date of Patent: November 19, 2013Assignee: Altera CorporationInventors: Suleyman Sirri Demirsoy, Hyun Yi
-
Patent number: 8510364Abstract: Methods for matrix processing and devices therefor are described. A systolic array in an integrated circuit is coupled to receive a first matrix as input; and is capable of operating in two modes, namely a triangularization mode and a back-substitution mode. The systolic array, when in a triangularization mode, is coupled to triangularize the first matrix to provide a second matrix. When in a back-substitution mode, the systolic array is coupled to invert the second matrix.Type: GrantFiled: September 1, 2009Date of Patent: August 13, 2013Assignee: Xilinx, Inc.Inventors: Raghavendar M. Rao, Christopher H. Dick
-
Patent number: 8473540Abstract: A decoder, such as for example an MMSE MIMO decoder, and a method for decoding are described. An input channel matrix is obtained, and an extended channel matrix of the input channel matrix is generated. The extended channel matrix is triangularized to provide a triangularized matrix, and the triangularized matrix is inverted to provide an inverted triangular matrix. A left matrix multiplication result matrix associated with multiplication of the input channel matrix and the inverted triangular matrix is generated, and a weight matrix from the left matrix multiplication result matrix and the inverted triangular matrix is generated. A received symbols matrix is obtained, and a weighted estimation is generated and output using the weight matrix and the received symbols matrix to provide an estimate of a transmit symbols matrix for output of estimated data symbols.Type: GrantFiled: September 1, 2009Date of Patent: June 25, 2013Assignee: Xilinx, Inc.Inventors: Raghavendar M. Rao, Christopher H. Dick
-
Patent number: 8473539Abstract: Nulling a cell of a complex matrix is described. A complex matrix and a modified Givens rotation matrix are obtained for multiplication by a processing unit, such as a systolic array or a CPU, for example, for the nulling of the cell to provide a modified form of the complex matrix. The modified Givens rotation matrix includes complex numbers c*, c, ?s, and s*, wherein the complex number s* is the complex conjugate of the complex number s, and wherein the complex number c* is the complex conjugate of the complex number c. The complex numbers c and s are associated with complex numbers of the complex matrix including the cell to be nulled. The modified form is then output by the processing unit. The modified Givens rotation matrix may be implemented as a systolic array or otherwise used for processing complex numbers or matrices.Type: GrantFiled: September 1, 2009Date of Patent: June 25, 2013Assignee: Xilinx, Inc.Inventors: Raghavendar M. Rao, Christopher H. Dick
-
Patent number: 8458243Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.Type: GrantFiled: March 3, 2010Date of Patent: June 4, 2013Assignee: Altera CorporationInventors: Suleyman Sirri Demirsoy, Hyun Yi
-
Patent number: 8417758Abstract: A method, machine-readable medium, and systolic array for left matrix multiplication of a first matrix and a second matrix are described. The first matrix is a triangular matrix, and a cross-diagonal transpose of the first matrix is loaded into a triangular array of cells in an integrated circuit. A cross-diagonal transpose of the second matrix is input into the triangular array of cells for multiplication with the cross-diagonal transpose of the first matrix to produce an interim result. The interim result is cross-diagonally transposed to provide a left matrix multiplication result, which is stored or otherwise output.Type: GrantFiled: September 1, 2009Date of Patent: April 9, 2013Assignee: Xilinx, Inc.Inventors: Raghavendar M. Rao, Christopher H. Dick
-
Publication number: 20100250640Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.Type: ApplicationFiled: November 21, 2008Publication date: September 30, 2010Inventor: Katsutoshi Seki
-
Publication number: 20090204658Abstract: A decimal calculation apparatus, which performs multidigit decimal calculation with the number of calculation digits set in a calculation instruction, includes a multidigit memory section which stores values with greater numbers of digits than the number of digits of a predetermined digit unit in a plurality of memory areas, a calculation-instruction memory section which stores the calculation instruction having the number of calculation digits and a type of calculation set therein, and a decimal calculation section which performs decimal calculation of sequentially calculating numerical values of corresponding digit units respectively stored in the plurality of memory areas of the multidigit memory section, digit unit by digit unit in the number of calculation digits set in the calculation instruction stored in calculation-instruction memory section, in decimal calculation according to type of calculation set in the calculation instruction stored in calculation-instruction memory section, and sequentially wrType: ApplicationFiled: April 22, 2009Publication date: August 13, 2009Applicant: Casio Computer Co., Ltd.Inventors: Hisashi ITO, Tetsuichi NAKAE
-
Patent number: 5948053Abstract: A digital signal processor has an arithmetic operation device that carries out arithmetic operations. The arithmetic operation device has a plurality of elementary arithmetic operation units. A signal path-forming device forms signal paths for inputting and outputting signals to and from the elementary arithmetic operations units, according to a predetermined program. The arithmetic operation device carries out processing of a digital signal input to the digital signal processor after the signal paths have been formed by the signal path-forming device.Type: GrantFiled: August 29, 1997Date of Patent: September 7, 1999Assignee: Yamaha CorporationInventor: Ryo Kamiya