SUPERCONDUCTING CIRCUIT FOR HIGH-SPEED LOOKUP TABLE
A high-speed lookup table is designed using Rapid Single Flux Quantum (RSFQ) logic elements and fabricated using superconducting integrated circuits. The lookup table is composed of an address decoder and a programmable read-only memory array (PROM). The memory array has rapid parallel pipelined readout and slower serial reprogramming of memory contents. The memory cells are constructed using standard non-destructive reset-set flip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit address decoder is implemented in the same technology and closely integrated with the memory array to achieve high-speed operation as a lookup table. The circuit architecture is scalable to large two-dimensional data arrays.
Latest HYPRES, INC. Patents:
- Oversampling digital receiver for radio-frequency signals
- Superconductor analog to digital converter
- Two stage radio frequency interference cancellation system and method
- Magnetic resonance system and method employing a digital SQUID
- System and method for noise reduction in magnetic resonance imaging
The present application is a continuation of U.S. patent application Ser. No. 12/258,682, filed Oct. 27, 2008, now U.S. Pat. No. 7,903,456, which is a continuation of U.S. patent application Ser. No. 11/360,749, filed Feb. 23, 2006, now U.S. Pat. No. 7,443,719, the entirety of which are expressly incorporated herein by reference.
GOVERNMENT CONTRACTResearch leading to this invention supported in part by US Army Contract W15P7T-04-C-K417; and US Navy Contract N00039-04-C-2134
FIELD OF THE INVENTIONThis invention relates to superconducting integrated circuits, specifically the development of a fast superconducting lookup-table memory array, which may be applied to ultrafast digital signal processing.
BACKGROUND OF THE INVENTIONUltrafast superconducting digital circuits are based on Josephson junctions integrated together according to RSFQ Logic (Rapid-single-flux-quantum), as originally developed and described by K. K. Likharev and V. K. Semenov (1991). Fast memory circuits in the same technology are also required for most non-trivial digital applications. One class of memory arrays are random-access memories, or RAM, which are particularly important for digital computing applications. Such applications require equally fast data writing and data retrieval. This is in contrast to many digital signal processing applications, in which the memory contents need to be read out quickly, but updated only rarely, requiring a programmable read-only memory (PROM) instead of a RAM. A particular application of interest is a circuit for real-time digital predistortion of radio-frequency (RF) signals, where the predistortion parameters would be maintained in a digital lookup table.
There have been several circuits proposed for superconductor RAM, such as the Ballistic RAM circuit invented by Hen (U.S. Pat. No. 6,836,141). However, such a circuit does not help one design a fast PROM, which has an architecture that is completely different. There have been no prior publications or patents describing a PROM-type RSFQ memory array or lookup table.
The article by Bunyk et al., entitled RFSQ Microprocessor: New Design Approaches in IEEE Transactions on Applied Superconductivity, Vol. 7, No. 2, June 1997, pp 2697-2704 utilizes an RFSQ Data Processing pipeline architecture, similar but distinct from that used in the code matching network of the present invention.
SUMMARY OF THE INVENTIONA digital lookup table takes a digital input X and provides a digital output Y such that Y=F(X), for any function F that is programmed into the memory array. The output values for each input value are stored in memory, and are recalled as needed. The circuit of the present invention comprises an address decoder and a programmable read-only memory array (PROM). The memory array has rapid parallel pipelined readout and slower serial reprogramming of memory contents. The memory cells are constructed using standard RSFQ elements, the non-destructive reset-set flip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit address decoder is implemented in the same technology and closely integrated with the memory array to achieve high-speed operation as a lookup table. A prototype section of a lookup table based on the invention, with a 3.times.4 address decoder and a 4×3 memory matrix, has been designed and fabricated on a 5 mm.times.5 mm niobium superconducting integrated circuit, for target operation at a clock frequency of 20 GHz which can result in a target rate of 20 G words/sec. The circuit architecture is scalable to large two-dimensional data arrays.
The contents of the flip-flops of row A1 are passed in pipe line fashion to the corresponding flip-flops in row A2 and then to row A3 and then down the remainder of the rows to the last row AN. Thus, a digital word input on the input lines Di propagates in pipeline fashion through the address decoder. One should note that, in the example shown, the selection of regular and complementary outputs for each flip-flop in a row is set to produce no output on the output line Ai when the digital word for which it is programmed is applied to that row of D flip-flops.
The address decoder, works on the premise of code matching. When the input address finds its match in a row of the code-matching part, the signaling logic for that row sends a “Read” pulse to the memory shown in
Each cell in the code-matching matrix performs two functions: (1) it produces an output to the signaling logic part, and (2) it allows synchronous data-flow down the column to the cell in the next row. Recognizing that as far as the output signaling is concerned, each DFFC, in this hard-wired configuration, works either as a DFF or as a NOT (clocked inverter) but never both, one can simplify the circuit complexity by choosing only one of them for each cell. Logically, this scheme is more complex because one has to account for inversions in the data flow-down path. One can do this by configuring the code-matching matrix column-by-column by placing a NOT cell to change the value (0-to-1 and 1-to-0) and a DFF cell when no change is needed. One column of such an arrangement (also corresponding to column D4 of
It is possible to design the code matching network to produce a logic 1 when a match occurs. Different signaling logic is utilized when the code matching network is designed to produce a 0 output on a line when a code match occurs from the situation when a logic 1 is produced on an output line by the code match network.
To the upper left of the memory array shown in
The type D flip-flops of a given row then transfer (vertically as shown) their contents to the next type D flip-flop in the column which then transfers its contents to the next type D flip-flop in the column and so on down to the output of the final type D flip-flop for a column which is applied to an output bus.
As discussed in conjunction with
The code matching elements shown in
As shown in
Both the input n-bit words and the output from the rows of the memory array operate in pipeline fashion. Specifically, with each clock cycle, digital words originally input on the input lines of the address decoder Di are propagated sequentially through the rows of the address decoder in a continuous fashion. Similarly, the outputs of rows of the memory array which are selected by the output lines Ai of the address decoder are propagated in sequential fashion down the columns of D flip-flops until they reach the output bus which serves as the output of the lookup table.
Note that a digital word input at the input of the address decoder may take several clock cycles before it finds a match in the address decoder which will trigger then the output of the corresponding row of the memory array. When it does, the output from the memory cells of that row are then applied to the D flip-flops and continue down in pipeline fashion to the output bus. As a result, each digital word applied to the input lines Di as it traverses the address decoder in pipeline fashion will activate one of the output lines which will result in transfer of the contents of a row of the memory array into the corresponding D flip-flops for passing down the memory array pipeline to the output bus. Even though the output for a particular digital word might actually be selected subsequent to selection of a different digital word, the overall ordering of the output words on the output bus will be strictly in sequential order corresponding to the input order of the n-bit digital words applied to the input. With 4-bit input numbers (0 to 15), the total throughput delay in all cases will be 18 clock cycles (τ).
While various embodiments of the present invention have been illustrated herein in detail, it should be apparent that modifications and adaptations to those embodiments may occur to those skilled in the art without departing from the scope of the present invention as set forth in the following claims.
Claims
1. A pipelined multi-bit processor, comprising:
- an input port configured to receive a multibit digital value;
- a processing network comprising a pipeline of successive processing stages employing superconducting elements, wherein the digital value is transformed in dependence on pipeline logic and passed to succeeding stages of the pipeline in dependence on a clock cycle; and
- at least one output port, configured to present an output in dependence on the received digital value, the clock cycle, the pipeline logic and a respective stage of the pipeline with which the output port is associated.
2. The pipelined multi-bit processor according to claim 1, wherein the pipeline logic implements a code matching network, configured to generate a plurality of output signals corresponding to the multibit digital value, further comprising a plurality of code-matching cells organized into rows and columns, each cell comprising a clocked rapid-single-flux-quantum device, having a column associated with each bit of the multibit digital value, and a row associated a respective output port, wherein the pipeline logic is configured to provide a variable time delay between a receipt of the multibit digital value at the input port, and presentation of the output at the output port, of an integral number of clock cycles, a value of the integral number varying in dependence on the multibit digital value.
3. The pipelined multi-bit processor according to claim 2, further comprising a pipelined memory array, configured to receive the output from the output port, and to produce an memory output representing a memory contents of at least one memory cell at an address of the pipelined memory array defined by the multibit digital input, the memory output having a total time delay between receipt of the multibit digital value and production of the memory output of an integral number of clock cycles that is independent of the multibit digital value.
4. The pipelined multi-bit processor according to claim 2, wherein each code-matching cell comprises a clocked data flip-flop having a regular output and complementary output, wherein either the regular output or the complementary output of the data flip-flop is connected to a respective network output line, depending on a bit value of a respective bit of the multibit digital value.
5. The pipelined multi-bit processor according to claim 2, wherein each code-matching cell comprises either a clocked data flip-flop or a clocked inverter, whereby the output of the code-matching cell is connected to the input of the succeeding row, as well as to a respective network output line.
6. The pipelined multi-bit processor according to claim 1, wherein the transformed multibit digital value is passed to succeeding stages.
7. The pipelined multi-bit processor according to claim 1, wherein the multibit digital value is passed to succeeding stages in a non-transformed state.
8. A pipelined processing method, comprising:
- receiving a multibit digital value;
- processing the received multibit digital value with a processing network comprising a pipeline of successive processing stages employing superconducting elements, wherein the digital value is transformed in dependence on pipeline logic and passed to succeeding stages of the pipeline in dependence on a clock cycle; and
- presenting an output in dependence on the received digital value, the clock cycle, the pipeline logic and a respective stage of the pipeline with which the output port is associated,
- wherein the pipeline logic implements a code matching network, generating a plurality of output signals corresponding to the multibit digital value, further comprising a plurality of code-matching cells organized into rows and columns, each cell comprising a clocked rapid-single-flux-quantum device, having a column associated with each bit of the multibit digital value, and a row associated a respective output port, wherein the pipeline logic provides a variable time delay between a receipt of the multibit digital value at the input port, and presentation of the output at the output port, of an integral number of clock cycles, a value of the integral number varying in dependence on the multibit digital value,
- and the ouput port is received by a pipelined memory array, which produces an memory output representing a memory contents of at least one memory cell at an address of the pipelined memory array defined by the multibit digital input,
- the memory output having a total time delay between receipt of the multibit digital value and production of the memory output of an integral number of clock cycles that is independent of the multibit digital value.
9. The method according to claim 8, wherein each code-matching cell comprises a clocked data flip-flop having a regular output and complementary output, wherein either the regular output or the complementary output of the data flip-flop is connected to a respective network output line, depending on a bit value of a respective bit of the multibit digital value.
10. The method according to claim 8, wherein each code-matching cell comprises either a clocked data flip-flop or a clocked inverter, whereby the output of the code-matching cell is connected to the input of the succeeding row, as well as to a respective network output line.
11. The method according to claim 8, wherein the transformed multibit digital value is passed to succeeding stages.
12. The method according to claim 8, wherein the multibit digital value is passed to succeeding stages in a non-transformed state.
13. A pipelined multi-bit processor comprising a code matching network, configured to generate a plurality of output signals corresponding to a multibit digital input value received at an input port, comprising a plurality of code-matching cells organized into rows and columns, each cell comprising a clocked rapid-single-flux-quantum device, having a row associated with a respective multibit digital value, and columns of the row which together define an output value, configured to selectively provide a variable time delay of an integral number of clock cycles between a receipt of the multibit digital value at the input port, and presentation of the output at an output port, a value of the integral number selectively varying in dependence on the multibit digital value.
14. The pipelined multi-bit processor according to claim 13, further comprising a pipelined memory array, configured to receive the output from the output, and to produce an memory output representing a memory contents of at least one memory cell corresponding to the output, the memory output having a total time delay between receipt of the multibit digital value and production of the memory output of an integral number of clock cycles that is independent of the multibit digital value.
15. The pipelined multi-bit processor according to claim 13, wherein each code-matching cell comprises a clocked data flip-flop having a regular output and complementary output, wherein either the regular output or the complementary output of the data flip-flop is connected to a respective network output line, depending on a bit value of a respective bit of the multibit digital value.
16. The pipelined multi-bit processor according to claim 13, wherein each code-matching cell comprises either a clocked data flip-flop or a clocked inverter, whereby the output of the code-matching cell is connected to the input of a succeeding row, as well as to a respective network output line.
Type: Application
Filed: Mar 8, 2011
Publication Date: Jul 7, 2011
Applicant: HYPRES, INC. (Elmsford, NY)
Inventors: Alex F. Kirichenko (Pleasantville, NY), Timur V. Filippov (Mahopac, NY), Deepnarayan Gupta (Hawthorne, NY)
Application Number: 13/043,272
International Classification: G06F 15/00 (20060101); G06F 9/30 (20060101);