Systolic Patents (Class 708/522)

Rank-based dot product circuitry

Patent number: 12197888

Abstract: Integrated circuits with dot product circuitry are provided. The dot product circuitry may be configured to generate partial products of different ranks based on the inputs. The partial products may be organized into corresponding groups based on their ranks. Each group of partial products having the same rank can then be compressed using a compressor/reduction tree. At least some of the compressed partial product values may be shifted between the different groups to maintain the proper offset. Each partial product may have an associated one's to two's complement conversion bit. The conversion bits of the various partial product groups can be separately aggregated and then injected into the compressor tree at one or more locations.

Type: Grant

Filed: December 23, 2019

Date of Patent: January 14, 2025

Assignee: Altera Corporation

Inventor: Martin Langhammer
Data compressor for approximation of matrices for matrix multiply operations

Patent number: 12072952

Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.

Type: Grant

Filed: March 26, 2021

Date of Patent: August 27, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Swapnil P. Sakharshete, Pramod Vasant Argade, Maxim V. Kazakov, Alexander M. Potapov
Low latency matrix multiply unit

Patent number: 11989259

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Grant

Filed: November 10, 2022

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
Low latency matrix multiply unit

Patent number: 11500961

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Type: Grant

Filed: March 26, 2020

Date of Patent: November 15, 2022

Assignee: Google LLC

Inventors: Andrew Everett Phelps, Norman Paul Jouppi
Incremental singular value decomposition in support of machine learning

Patent number: 11314844

Abstract: A singular value decomposition (SVD) is computed of a first matrix to define a left matrix, a diagonal matrix, and a right matrix. The left matrix, the diagonal matrix, and the right matrix are updated using an arrowhead matrix structure defined from the diagonal matrix and by adding a next observation vector to a last row of the first matrix. The updated left matrix, the updated diagonal matrix, and the updated right matrix are updated using a diagonal-plus-rank-one (DPR1) matrix structure defined from the updated diagonal matrix and by removing an observation vector from a first row of the first matrix. Eigenpairs of the DPR1 matrix are computed based on whether a value computed from the updated left matrix is positive or negative. The left matrix updated in (C), the diagonal matrix updated in (C), and the right matrix updated in (C) are output.

Type: Grant

Filed: October 19, 2021

Date of Patent: April 26, 2022

Assignee: SAS Institute Inc.

Inventors: Hansi Jiang, Arin Chaudhuri
Singular value decomposition of complex matrix

Patent number: 9448970

Abstract: Computerized singular value decomposition of an input complex matrix. A real-value matrix representation of the input complex matrix is provided to a singular value decomposition module, which correctly obtains a singular value representation of the real-value matrix representation. However, the result is not provided in a form for convenient conversion back into a valid singular value decomposition solution for the original input complex matrix, as the upper left half and lower right half of the diagonal of the diagonal matrix are not identical. A correction module corrects by formulating a corrected diagonal matrix that represents the value of the diagonal of the first diagonal matrix, but shuffled so that the upper left half of the diagonal of the second diagonal matrix is the same as the lower right half of the diagonal of the second diagonal matrix. Corrected unitary matrices may also be formed.

Type: Grant

Filed: June 14, 2013

Date of Patent: September 20, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chun Sun, Sudarshan Raghunathan, Parry Jones Reginald Husbands, Tong Wen
Minimum mean square error processing

Patent number: 9047240

Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.

Type: Grant

Filed: January 28, 2013

Date of Patent: June 2, 2015

Assignee: XILINX, INC.

Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
Minimum mean square error processing

Patent number: 9047241

Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.

Type: Grant

Filed: January 28, 2013

Date of Patent: June 2, 2015

Assignee: XILINX, INC.

Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
Bi-directional ring-bus architecture for CORDIC-based matrix inversion

Patent number: 8824603

Abstract: A method and a system is provided for Coordinate Rotation Digital Computer (CORDIC) based matrix inversion of input digital signal streams from multiple antennas using an bi-directional ring-bus architecture. The bi-directional ring bus includes a first ring bus having signals flow in a clockwise direction, and a second ring bus having signals flow in a counter-clockwise direction. An I/O controller is coupled to the first and the second ring bus, respectively. A plurality of processing elements (PEs), where each of the plurality of PEs is coupled to the first and the second ring bus, respectively, wherein each of the plurality of PEs includes at least one CORDIC core for performing CORDIC iterations on the plurality of input digital stream signals to produce inversed matrix signals.

Type: Grant

Filed: March 1, 2013

Date of Patent: September 2, 2014

Assignee: Futurewei Technologies, Inc.

Inventors: Yiqun Ge, Qifan Zhang, Peter Man Kin Sinn
CORDIC-based FFT and IFFT apparatus and method

Patent number: 8706787

Abstract: Provided two CORDIC processors, each including: two input ports representing real and imaginary input ports; and two output ports representing real and imaginary output ports; wherein real and imaginary parts of a first input signal are applied to the imaginary input ports of the first and second CORDIC processors; real and imaginary parts of a second input signal are applied to the real input ports of the first and second CORDIC processors; the first and second CORDIC processors rotate the respective input signals applied thereto by 45 degrees in the clockwise direction; respective data from the real output ports of said first and second CORDIC processors constitute real and imaginary parts of a first output signal; and respective data from the imaginary output ports of said first and second CORDIC processors constitute real part and imaginary part of a second output signal.

Type: Grant

Filed: September 26, 2007

Date of Patent: April 22, 2014

Assignee: NEC Corporation

Inventor: James Awuor Oduor Okello
Resolving buffer underflow/overflow in a digital system

Patent number: 8650238

Abstract: In a digital system with more than one clock source, lack of synchronization between the clock sources may cause overflow or underflow in sample buffers, also called sample slipping. Sample slipping may lead to undesirable artifacts in the processed signal due to discontinuities introduced by the addition or removal of extra samples. To smooth out discontinuities caused by sample slipping, samples are filtered to when a buffer overflow condition occurs, and the samples are interpolated to produce additional samples when a buffer underflow condition occurs. The interpolated samples may also be filtered. The filtering and interpolation operations can be readily implemented without adding significant burden to the computational complexity of a real-time digital system.

Type: Grant

Filed: November 28, 2007

Date of Patent: February 11, 2014

Assignee: QUALCOMM Incorporated

Inventors: Dinesh Ramakrishnan, Song Wang, Eddie L. T. Choy, Samir Kumar Gupta
Minimum mean square error processing

Patent number: 8620984

Abstract: A first systolic array receives an input set of time division multiplexed matrices from a plurality of channel matrices. In a first mode, the first systolic array performs triangularization on the input matrices, producing a first set of matrices, and in a second mode performs back-substitution on the first set, producing a second set of matrices. In a first mode, a second systolic array performs left multiplication on the second set of matrices with the input set of matrices, producing a third set of matrices. In a second mode, the second systolic array performs cross diagonal transposition on the third set of matrices, producing a fourth set of matrices, and performs right multiplication on the second set of matrices with the fourth set of matrices. The first systolic array switches from the first mode to the second mode after the triangularization, and the second systolic array switches from the first mode to the second mode after the left multiplication.

Type: Grant

Filed: November 23, 2009

Date of Patent: December 31, 2013

Assignee: Xilinx, Inc.

Inventors: Raied N. Mazahreh, Hai-Jo Tarn, Raghavendar M. Rao
Systolic array and calculation method

Patent number: 8589467

Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.

Type: Grant

Filed: November 21, 2008

Date of Patent: November 19, 2013

Assignee: NEC Corporation

Inventor: Katsutoshi Seki
Digital signal processing circuit blocks with support for systolic finite-impulse-response digital filtering

Patent number: 8589465

Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.

Type: Grant

Filed: May 8, 2013

Date of Patent: November 19, 2013

Assignee: Altera Corporation

Inventors: Suleyman Sirri Demirsoy, Hyun Yi
Systolic array for matrix triangularization and back-substitution

Patent number: 8510364

Abstract: Methods for matrix processing and devices therefor are described. A systolic array in an integrated circuit is coupled to receive a first matrix as input; and is capable of operating in two modes, namely a triangularization mode and a back-substitution mode. The systolic array, when in a triangularization mode, is coupled to triangularize the first matrix to provide a second matrix. When in a back-substitution mode, the systolic array is coupled to invert the second matrix.

Type: Grant

Filed: September 1, 2009

Date of Patent: August 13, 2013

Assignee: Xilinx, Inc.

Inventors: Raghavendar M. Rao, Christopher H. Dick
Decoder and process therefor

Patent number: 8473540

Abstract: A decoder, such as for example an MMSE MIMO decoder, and a method for decoding are described. An input channel matrix is obtained, and an extended channel matrix of the input channel matrix is generated. The extended channel matrix is triangularized to provide a triangularized matrix, and the triangularized matrix is inverted to provide an inverted triangular matrix. A left matrix multiplication result matrix associated with multiplication of the input channel matrix and the inverted triangular matrix is generated, and a weight matrix from the left matrix multiplication result matrix and the inverted triangular matrix is generated. A received symbols matrix is obtained, and a weighted estimation is generated and output using the weight matrix and the received symbols matrix to provide an estimate of a transmit symbols matrix for output of estimated data symbols.

Type: Grant

Filed: September 1, 2009

Date of Patent: June 25, 2013

Assignee: Xilinx, Inc.

Inventors: Raghavendar M. Rao, Christopher H. Dick
Modified givens rotation for matrices with complex numbers

Patent number: 8473539

Abstract: Nulling a cell of a complex matrix is described. A complex matrix and a modified Givens rotation matrix are obtained for multiplication by a processing unit, such as a systolic array or a CPU, for example, for the nulling of the cell to provide a modified form of the complex matrix. The modified Givens rotation matrix includes complex numbers c*, c, ?s, and s*, wherein the complex number s* is the complex conjugate of the complex number s, and wherein the complex number c* is the complex conjugate of the complex number c. The complex numbers c and s are associated with complex numbers of the complex matrix including the cell to be nulled. The modified form is then output by the processing unit. The modified Givens rotation matrix may be implemented as a systolic array or otherwise used for processing complex numbers or matrices.

Type: Grant

Filed: September 1, 2009

Date of Patent: June 25, 2013

Assignee: Xilinx, Inc.

Inventors: Raghavendar M. Rao, Christopher H. Dick
Digital signal processing circuit blocks with support for systolic finite-impulse-response digital filtering

Patent number: 8458243

Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.

Type: Grant

Filed: March 3, 2010

Date of Patent: June 4, 2013

Assignee: Altera Corporation

Inventors: Suleyman Sirri Demirsoy, Hyun Yi
Left and right matrix multiplication using a systolic array

Patent number: 8417758

Abstract: A method, machine-readable medium, and systolic array for left matrix multiplication of a first matrix and a second matrix are described. The first matrix is a triangular matrix, and a cross-diagonal transpose of the first matrix is loaded into a triangular array of cells in an integrated circuit. A cross-diagonal transpose of the second matrix is input into the triangular array of cells for multiplication with the cross-diagonal transpose of the first matrix to produce an interim result. The interim result is cross-diagonally transposed to provide a left matrix multiplication result, which is stored or otherwise output.

Type: Grant

Filed: September 1, 2009

Date of Patent: April 9, 2013

Assignee: Xilinx, Inc.

Inventors: Raghavendar M. Rao, Christopher H. Dick
SYSTOLIC ARRAY AND CALCULATION METHOD

Publication number: 20100250640

Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.

Type: Application

Filed: November 21, 2008

Publication date: September 30, 2010

Inventor: Katsutoshi Seki
DECIMAL COMPUTING APPARATUS, ELECTRONIC DEVICE CONNECTABLE DECIMAL COMPUTING APPARATUS, ARITHMETIC OPERATION APPARATUS, ARITHMETIC OPERATION CONTROL APPARATUS, AND PROGRAM-RECORDED RECORDING MEDIUM

Publication number: 20090204658

Abstract: A decimal calculation apparatus, which performs multidigit decimal calculation with the number of calculation digits set in a calculation instruction, includes a multidigit memory section which stores values with greater numbers of digits than the number of digits of a predetermined digit unit in a plurality of memory areas, a calculation-instruction memory section which stores the calculation instruction having the number of calculation digits and a type of calculation set therein, and a decimal calculation section which performs decimal calculation of sequentially calculating numerical values of corresponding digit units respectively stored in the plurality of memory areas of the multidigit memory section, digit unit by digit unit in the number of calculation digits set in the calculation instruction stored in calculation-instruction memory section, in decimal calculation according to type of calculation set in the calculation instruction stored in calculation-instruction memory section, and sequentially wr

Type: Application

Filed: April 22, 2009

Publication date: August 13, 2009

Applicant: Casio Computer Co., Ltd.

Inventors: Hisashi ITO, Tetsuichi NAKAE
Digital signal processor architecture using signal paths to carry out arithmetic operations

Patent number: 5948053

Abstract: A digital signal processor has an arithmetic operation device that carries out arithmetic operations. The arithmetic operation device has a plurality of elementary arithmetic operation units. A signal path-forming device forms signal paths for inputting and outputting signals to and from the elementary arithmetic operations units, according to a predetermined program. The arithmetic operation device carries out processing of a digital signal input to the digital signal processor after the signal paths have been formed by the signal path-forming device.

Type: Grant

Filed: August 29, 1997

Date of Patent: September 7, 1999

Assignee: Yamaha Corporation

Inventor: Ryo Kamiya