Delay tolerant asynchronous interface (DANI)
A Delay-tolerant Asynchronous Interface (DANI) is typically used to make the clock domains for reusable silicon intellectual property (IP) cores completely independent of each other. In fact, a DANI-wrapped IP core usually appears to its environment as if it were clockless. This property is necessary to address the variability in data transmission-time between source and destination. This variability is a result of increased lack of predictability in today's leading-edge manufacturing processes. A DANI wrapper can be applied to the IP core that is the source of data to be transmitted or it can be applied to the IP core that is the destination of that data. The transmission time over the route between source and destination may vary more than a single clock period.
This application claims the benefit of U.S. Provisional Application No. 61/701,704, filed Sep. 16, 2012, which is hereby incorporated by reference in its entirety.
TECHNICAL FIELDThe present disclosure relates generally to the design of computer and communication systems; and in particular, but not limited to, delay-tolerant asynchronous interfaces that provide a reliable communications interface between systems, such as, but not limited to synchronous cores on an integrated circuit chip.
BACKGROUNDThe semiconductor industry continues to decrease the minimum feature-size of transistors and thereby increase the density of transistors on an integrated circuit (IC). Today, billion-transistor circuits are being produced and much higher densities are forecast for the years to come. However, it has become increasingly difficult to meet timing constraints throughout an integrated circuit that has but a single clock domain. A globally-asynchronous, locally-synchronous (GALS) approach has been gaining in popularity to overcome this difficult architectural problem. The GALS approach is to partition a system design into decoupled clock-independent modules that can be designed to meet their individual requirements. These independent modules can then be coupled using an asynchronous interconnect network or an asynchronous network-on-chip (ANoC), which improves reliability by simplifying clock-domain crossing timing by using delay-tolerant connection modules. However, the complexity of such interconnect networks (measured in terms of the number of different ways control signals traverse such an interconnect network) grows exponentially instead of linearly as the number of independent control network elements used in implementing the interconnect network is increased. Therefore, providing a reliable interconnect network becomes problematic without a methodology to control this increased complexity.
The appended claims set forth the features of one or more embodiments with particularity. The embodiment(s), together with its advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings of which:
1. Overview
Disclosed are, inter alia, methods, apparatus, computer-storage media, mechanisms, and means associated with a delay-tolerant asynchronous interface. One embodiment includes an integrated circuit, comprising: a source wrapper providing an asynchronous sending interface to a sending system on the integrated circuit, with the asynchronous sending interface producing a write clock output signal and a data output signal; a destination wrapper providing an asynchronous receiving interface to a receiving system on the integrated circuit, with the asynchronous receiving interface receiving a write clock input signal and a data input signal; and signal paths on the integrated circuit communicatively coupling the write clock output signal and the write clock input signal, and the data output signal and the data input signal, with the signal paths providing said received write clock input and data input signals with a relative timing said produced between said write clock output and data output signals.
In one embodiment, the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO) providing an intermediate storage of information received on the data input signal lines from a first clock domain with timing corresponding to the write clock input signal and provided to the receiving system operating in a different clock domain that is timed by a read clock received from the receiving system. In one embodiment, the destination wrapper uses a unary code, not a Gray code, to determine locations within the aFIFO. In one embodiment, the destination wrapper produces token-based flow control information provided to the source wrapper over a flow control signal path for controlling sending of information from the source wrapper to the destination wrapper. In one embodiment, each of the sending and receiving systems is synchronous.
2. Description
A Delay-tolerant Asynchronous Interface (DANI) is typically used to make the clock domains for reusable silicon intellectual property (IP) cores completely independent of each other. In fact, a DANI-wrapped IP core usually appears to its environment as if it were clockless. This property is necessary to address the variability in data transmission-time between source and destination. This variability is a result of the lack of predictability of the properties of transistors and their interconnections in today's leading-edge, integrated-circuit manufacturing processes. The term “asynchronous” is used in referring to the wrappers because they provide a non-synchronous interface between sending and receiving systems. One embodiment employs dual clocking of components in the asynchronous interfaces.
A DANI wrapper is applied to the IP core that is the source of data to be transmitted or it can be applied to the IP core that is the destination of that data. The transmission time over the route between source and destination may vary, both within and among integrated circuits and be more than a single clock period in duration. The source of data may be synchronous and the destination for that data may also be synchronous, but may be operating at a different clock frequency and/or phase. However, this invention also applies if the source, destination or both have an irregular clock and/or are asynchronous.
There are many possible embodiments of a DANI. Note, the term “one embodiment” is used herein to reference a particular embodiment, wherein each reference to “one embodiment” may refer to a different embodiment, and the use of the term repeatedly herein in describing associated features, elements and/or limitations does not establish a cumulative set of associated features, elements and/or limitations that each and every embodiment must include, although an embodiment typically may include all these features, elements and/or limitations. Also, typically same figure numbers used in different figures typically refer to the same thing in each figure; and typically the last two digits of a three-digit reference number correspond to a same thing but in different embodiments.
One embodiment is expressed as a hierarchical set of block diagrams. At the top level there are two alternative cases:
-
- DANI without flow control. A wrapper for the destination IP core that can be used when the source clock frequency is never greater than the destination clock frequency. A trivial wrapper for the source may also be included.
- DANI with flow control. Wrappers applied to both source and destination IP cores that can be used no matter the relationship between source and destination clock frequencies.
Section 1 reviews the case without flow control. The flow-control case in Section 2 then requires only a few additional ideas. Section 3 reviews some synchronization issues. Section 4 discusses some practical issues related to signal integrity. Section 5 reminds the reader of the vast number of embodiments of the teachings described herein.
1. DANI without Flow Control.
Specifically referring to
There are several source-synchronous write clock 131 embodiments, such as, but not limited to those using two-phase or four-phase clocking, etc. Typically, signal integrity issues will dictate which of them should be used for a particular integrated circuit. Two-phase embodiments transmit the clock at half the frequency of source clock 111, either on one or two wires. These two-phase embodiments are more complicated at the destination than four-phase. Therefore, we delay their discussion until Section 4 and assume here the four-phase option that sets write clock 131 equal in frequency to source clock 111.
Destination wrapper and receiving system 150 of
Destination control 170 of destination wrapper 160 provides, based on write clock 131, enabling signals (read enable 172 and write enable 171) for reading and writing the appropriate w-bit wide register of an asynchronous FIFO 180 (aFIFO) of depth d (meaning it can store d different words of w-bits wide). The source-synchronous write clock 131 drives the writing process at the aFIFO 180 while the destination's read clock 191 drives the reading process. The
One embodiment includes multiple instances of the source control 130 and source data register 140 within the source wrapper 120. Similarly, one embodiment includes multiple instances of the destination control 170 and the aFIFO 180 within the destination wrapper.
This synchronization is done in HR register 274 receiving HW signal 273 so that the synchronized write register output 275 and read register output 285 can be compared by comparator 290 in the domain of the read clock (191). When HR 275 and TR 285 are different, data 181 (from aFIFO 180 of
Shown in
-
- 0000→1000→1100→1110→1111→0111→0011→0001→0000
This sequence is a unary code that is fixed in length and repeats cyclically, stepping forward on each rising edge of write clock 131. Note that HW 372 contains a code for which a transition from 1 to 0 or from 0 to 1 in the example sequence of four bits identifies a unique aFIFO location that is used to construct a four-bit address pointer. This rule applies except for the 1111 and 0000 cases when the right-most bit is the pointer. In one embodiment, a gray code, lookup table, and/or other sequence generator is used instead of the unary code described supra.
This particular, fixed-length unary code has the property that only one bit changes at each step in the sequence and can be easily generalized to any number of bits d. The property of the code wherein only a single-bit changes on each rising edge of the write clock facilitates the synchronization that takes place in HR 274.
Referring to
Referring to
Xi=Ui⊕Ui+1;i=1,2, . . . d−1
Xd=Ud⊕Ū1;i=d−1
An example conversion from U→X for d=4 is
-
- 0000→0001, 1000→1000, 1100→0100, 1110→0010,
- 1111→0001, 0111→1000, 0011→0100, 0001→0010.
HW register 372 (of
Shown in
If care is not taken in laying out an integrated circuit, the temporal relationship among the w-bit data lines 141 input to the destination wrapper 160 may be overly skewed. Similarly, the temporal relationship between the write clock 131 and these data line 141 may also be overly skewed. Too much skew in any of these relationships may lead to setup or hold violations at the inputs to the d registers of aFIFO 480. These violations may, in turn, lead to data errors. Design tools generally use synchronous timing constraints that utilize absolute values of time measured with respect to the root of the clock tree. These constraints are ineffective in controlling the skew in data and clock signals input to destination wrapper 160. However, relative timing constraints applied, in one embodiment, at the destination wrapper 160 between the data lines 141 and the write clock 131 can minimize this skew. Application of said relative constrains can yield reliable performance of the resulting integrated circuit. In one embodiment satisfaction of these relative constraints is accomplished by iteratively rerouting problem paths until static timing analysis determines that skew is within acceptable limits.
2. DANI with Flow Control.
The details of one embodiment 630 of source control 530 of
One embodiment with flow control includes multiple instances of the source control 530, the source data register 540, the destination control 570 and the aFIFO 580 within the source and destination wrappers 520 and 560.
The destination control design of
This method of flow control of one embodiment can be understood from examination of the Petri net 800 shown in
The system conserves the number of tokens in the Petri net. As a result there can never be more than d tokens in the right hand place modeling the number of data words in the destination aFIFO 580 of destination wrapper 560 (
The Petri net initial condition of d tokens in the left-hand place 801 of
One embodiment, such as that modeled by the Petri Net 800 of
3. Improving Mean Time Between Failures (MTBF).
The logic 290 in
where τ is the settling time-constant of the flip-flops in HR 274, TW is their metastability window, fW is the frequency of write clock (131) transitions and fR is the read clock (191) frequency.
In order to maximize the MTBF when the parameters and clock frequencies for the circuit are fixed, the available settling time tS is made as large as possible. This time is compromised by both tL and tSU. The logic delay tL through the HR≠TR block 290 is at best equivalent to two gates in an ASIC or a single LUT in an FPGA. The logic family used will fix the setup time tSU. As a result, one embodiment may not achieve an adequate MTBF with the design shown in
Two embodiments for additional synchronization settling-time are shown in
Embodiment 900 is a familiar two-stage synchronizer 900 instantiated for each of the d bits in HR 274 (
In embodiment 920 of
It might seem that the indeterminacy resulting from marginal triggering of the flip-flop in embodiment 920 of
Which design is best will depend on circuit parameters. However, embodiment 920 of
Note, this analysis discussed supra applies to other embodiments, such as that of wrapper destination control 770, including logic 790, of
4. Signal Integrity Issues.
The write clock line 131 and data bus 141 (of
It is also desirable to have the source wrapper 520 launch the data 541 and the write clock 531 with a well-defined phase relationship to each other. This simplifies the application of relative timing constraints and can be done if all signals are similarly registered at the source wrapper 520. However, registering the data is difficult to do when the clock line must have twice as many transitions as the data lines.
Both of these issues can be addressed by reducing the frequency of write clock 531 to half that of source clock 511. One scheme for accomplishing this frequency division is by including a toggle flip-flop in the source control 530 of
In an alternative scheme, two toggle flip-flops are included at the source control 530 of
These alternative schemes for reducing the transmitted source-synchronous clock frequency have different advantages and disadvantages. The choice between them will depend on individual design considerations.
The write clock 531 and ACK 532 lines shown in
A very wide data bus 141 of
5. These Ideas can be Broadly Applied.
In view of the many possible embodiments to which the principles of our invention(s) may be applied, it will be appreciated that the embodiments and aspects thereof described herein with respect to the drawings/figures are only illustrative and should not be taken as limiting the scope of the invention(s). The invention(s) as described herein contemplates all such embodiments as may come within the scope of identified claims and equivalents thereof based on this disclosure.
Claims
1. An integrated circuit, comprising:
- a source wrapper configured to provide an asynchronous sending interface for a sending system on the integrated circuit, with the asynchronous sending interface configured to produce a write clock output signal and one or more data output signals, and configured to receive and react to flow control information;
- a destination wrapper configured to provide an asynchronous receiving interface for a receiving system on the integrated circuit and to produce said flow control information allowing a plurality of data words in flight between the source wrapper and the destination wrapper without an overflow loss in the destination wrapper, with the asynchronous receiving interface configured to receive a write clock input signal and one or more data input signals, wherein the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO) providing an intermediate storage of information received on said data input signals from a first clock domain corresponding to the write clock input signal and provided to the receiving system in a different clock domain corresponding to a read clock received from the receiving system; and
- signal paths on the integrated circuit configured to communicatively couple the write clock output signal with the write clock input signal and said data output signals with said data input signals, and to provide a flow control signal path communicating said flow control information from the destination wrapper to the source wrapper, with said signal paths providing the write clock input signal and said data input signals with relative timing constraints applied between the write clock input signal and said data input signals.
2. The integrated circuit of claim 1, wherein the destination wrapper uses a unary code to specify locations within the aFIFO.
3. The integrated circuit of claim 1, wherein said flow control information includes token-based flow control information.
4. The integrated circuit of claim 1, wherein the aFIFO is configured to store a maximum of d data words; and wherein said flow control information allows for a maximum of d data words to be in flight between the source wrapper and the destination wrapper, wherein d is greater than one.
5. The integrated circuit of claim 1, wherein the sending system and receiving systems are operated on different clocks with one or more different operating clock rates.
6. The integrated circuit of claim 1, wherein said signal paths include no intervening pipeline stages.
7. The integrated circuit of claim 1, wherein the destination wrapper does not use a Gray code to determine locations within the aFIFO.
8. The integrated circuit of claim 1, wherein each of the sending and receiving systems is synchronous.
9. The integrated circuit of claim 1, wherein the source wrapper and destination wrapper are configured to communicate only using said signal paths.
10. The integrated circuit of claim 9, wherein the write clock output signal is a sending system clock gated with a sending system data available signal.
11. The integrated circuit of claim 1, wherein the source wrapper is co-located with the sending system, the destination wrapper is co-located with the receiving system, the destination wrapper is remote from the source wrapper.
12. The integrated circuit of claim 11, wherein the write clock output signal is a sending system clock gated with a sending system data available signal.
13. An integrated circuit, comprising:
- a source wrapper providing an asynchronous sending interface for a sending system on the integrated circuit, with the asynchronous sending interface producing a write clock output signal and a data output signal;
- a destination wrapper providing an asynchronous receiving interface for a receiving system on the integrated circuit, with the asynchronous receiving interface receiving a write clock input signal and data input signal wherein the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO), and wherein the destination wrapper uses a unary code to specify locations within the aFIFO; and
- signal paths on the integrated circuit communicatively coupling the write clock output signal and the write clock input signal, and the data output signal and said data input signal, with the signal paths providing the write clock input signal and said data input signal with relative timing constraints applied between the write clock input signal and said data input signal.
14. A method, comprising:
- in response to receiving flow control information identifying that a destination wrapper can accept a plurality of data words, a source wrapper sending to the destination wrapper a plurality of data words such that at least two of the plurality of data words are overlapping in flight between the source and destination wrappers; wherein said sending a particular data word of the plurality of data words includes providing a write clock signal and a w-bits wide data signal, with w being an integer greater than zero;
- for each particular data word of the plurality of data words: receiving, by the destination wrapper, the write clock signal and the data signal with relative timing constraints maintained for said sent write clock and data signals; storing, by the destination wrapper, said particular data word communicated in said received data signal in an asynchronous first-in, first-out queue (aFIFO) according to a first clock domain corresponding to said received write clock signal; and receiving, by a receiving system on the integrated circuit, the particular data word from the aFIFO according to a second clock domain according to a read clock signal provided by the receiving system to the destination wrapper.
15. The method of claim 14, comprising receiving, by the source wrapper from a sending system on the integrated circuit, the data word according to the first clock domain according to a write clock signal provided by the sending system to the source wrapper.
16. The method of claim 14, wherein said flow control information is token-based flow control information.
17. The method of claim 14, wherein the receiving system is synchronous.
18. The method of claim 14, wherein the aFIFO is configured to store a maximum of d data words; and wherein said flow control information allows for a maximum of d data words to be in flight to the destination wrapper; wherein d is greater than one.
19. An integrated circuit, comprising:
- a receiving system; and
- a destination wrapper providing an asynchronous receiving interface for a receiving system on the integrated circuit, with the asynchronous receiving interface configured to receive a write clock input signal and data input signals, with relative timing constraints applied between the write clock input signal and said data input signals, and to generate flow control information to signal to a source wrapper that the source wrapper can send information to the destination wrapper in a manner that allows multiple data words in flight between the source wrapper and the destination wrapper;
- wherein the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO) providing an intermediate storage of information received on said data input signals in a first clock domain corresponding to the write clock input signal and provided to the receiving system in a different clock domain corresponding to a read clock received from the receiving system.
20. The integrated circuit of claim 19, wherein said flow control information includes token-based flow control information for controlling sending of information to the destination wrapper.
21. The integrated circuit of claim 19, wherein the aFIFO is configured to store a maximum of d data words; and wherein said flow control information allows for a maximum of d data words to be in flight to the destination wrapper; wherein d is greater than one.
22. The integrated circuit of claim 19, wherein the receiving system is synchronous.
23. An integrated circuit, comprising:
- a receiving system; and
- a destination wrapper providing an asynchronous receiving interface for a receiving system on the integrated circuit, with the asynchronous receiving interface configured to receive a write clock input signal and data input signal, with relative timing constraints applied between the write clock input signal and said data input signal;
- wherein the destination wrapper includes an asynchronous first-in, first-out queue (aFIFO) providing an intermediate storage of information received on said data input signal from a first clock domain corresponding to the write clock input signal and provided to the receiving system in a different clock domain corresponding to a read clock received from the receiving system; and
- wherein the destination wrapper uses a unary code to specify locations within the aFIFO.
24. The integrated circuit of claim 23, wherein the destination wrapper is configured to produce said flow control information allowing a plurality of data words in flight between a source wrapper of a sending system and the destination wrapper without an overflow loss in the destination wrapper.
6850092 | February 1, 2005 | Chelcea et al. |
7310396 | December 18, 2007 | Sabih |
8559576 | October 15, 2013 | Ono et al. |
20070097771 | May 3, 2007 | Chu et al. |
20090019193 | January 15, 2009 | Luk |
20090323876 | December 31, 2009 | Ono et al. |
- Quinton et al., “Practical Asynchronous Interconnect Network Design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, May 2008, pp. 579-588, vol. 16, No. 5, IEEE, New York, NY.
- Santosh Sood, “A Novel Interleaved and Distributed FIFO,” Thesis, Nov. 2005, The University of British Columbia, Vancouver, BC, CA (115 pages).
Type: Grant
Filed: Sep 12, 2013
Date of Patent: Sep 2, 2014
Assignee: Blendics, Inc. (St. Louis, MO)
Inventors: Jerome R. Cox, Jr. (Sunset Hills, MO), George Engel (Maryville, IL), James Moscola (Red Lion, PA), Thomas J. Chaney (Bridgeton, MO)
Primary Examiner: Dennis M Butler
Application Number: 14/025,677
International Classification: G06F 1/04 (20060101); G06F 1/10 (20060101); G06F 1/12 (20060101); G06F 5/06 (20060101); G06F 15/78 (20060101);