Active-by-active programmable device
An example integrated circuit (IC) system includes a package substrate having a programmable integrated circuit (IC) and a companion IC mounted thereon, the programmable IC including a programmable fabric and the companion IC including application circuitry. The IC system further includes a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC, a second SiP IO circuit disposed in the companion IC, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit. The IC System further includes first aggregation and first dispersal circuits in the programmable IC coupled between the programmable fabric and the first SiP IO circuit. The IC system further includes second aggregation and second dispersal circuits in the companion IC coupled between the application circuitry and the second SiP IO circuit.
Latest XILINX, INC. Patents:
- Data processing array interface having interface tiles with multiple direct memory access circuits
- SELF-AUTHENTICATION OF DATA STORED OFF-CHIP
- DAC-based transmit driver architecture with improved bandwidth
- Shared depthwise convolution
- Implementing data flows of an application across a memory hierarchy of a data processing array
This application is an application for reissue of U.S. Pat. No. 10,002,100, which is incorporated herein by reference in its entirety.
TECHNICAL FIELDExamples of the present disclosure generally relate to electronic circuits and, in particular, to an active-by-active programmable device.
BACKGROUNDModern programmable devices, such as field programmable gate arrays (FPGAs), are growing in size and becoming more heterogeneous. Their cost is also rapidly increasing due to both more expensive process technology and increasing overhead of programmability for a majority of applications that do not require all of the heterogeneous circuit blocks. Many of these large circuit blocks, such as general purpose input/output (IO) or multi-gigabit serial transceivers (MGTs), do not require the benefits of new process technology. Thus, the traditional monolithic architectures no longer meet the cost requirements of the market, leading to the development of system-in-package (SiP) devices. The majority of SiP solutions, however, rely on advanced packaging techniques, such as the use of expensive interposers or complex three-dimensional die stacking. As such, the added cost of these SiP solutions limits the benefits to high-end or niche applications with low production volume.
SUMMARYTechniques for providing an active-by-active programmable device are described. In an example, an integrated circuit (IC) system includes a package substrate having a programmable integrated circuit (IC) die and a companion IC die mounted thereon, the programmable IC die including a programmable fabric and the companion IC die including application circuitry. The IC system further includes a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC die, a second SiP IO circuit disposed in the companion IC die, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit. The IC System further includes first aggregation and first dispersal circuits in the programmable IC die coupled between the programmable fabric and the first SiP IO circuit. The IC system further includes second aggregation and second dispersal circuits in the companion IC die coupled between the application IO and the second SiP IO circuit.
In another example, a programmable integrated circuit (IC) includes a system-in-package (SiP) input/output (IO) circuit coupled to a companion IC through external conductive interconnect; a programmable fabric without at least a portion of application circuitry; and aggregation and dispersal circuits coupled between the programmable fabric and the SiP IO circuit.
In another example, a method of transmitting data from a programmable integrated circuit (IC) in an IC system includes coupling the data to a first system-in-package (SiP) IO circuit through a plurality of channels of an aggregation circuit in the programmable IC. The method further includes transmitting the data from the plurality of channels over a smaller number of physical channels between the programmable IC and a companion IC. The method further includes receiving the data from the plurality of physical channels at a second SiP IO circuit in the companion IC. The method further includes coupling the data from the second SiP IO circuit to application circuitry in the companion IC through a plurality of channels of a dispersal circuit in the companion IC. The method further includes transmitting the data from the application IO circuits.
These and other aspects may be understood with reference to the following detailed description.
So that the manner in which the above recited features can be understood in detail, a more particular description, briefly summarized above, may be had by reference to example implementations, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical example implementations and are therefore not to be considered limiting of its scope.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements of one example may be beneficially incorporated in other examples.
DETAILED DESCRIPTIONVarious features are described hereinafter with reference to the figures. It should be noted that the figures may or may not be drawn to scale and that the elements of similar structures or functions are represented by like reference numerals throughout the figures. It should be noted that the figures are only intended to facilitate the description of the features. They are not intended as an exhaustive description of the claimed invention or as a limitation on the scope of the claimed invention. In addition, an illustrated example need not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular example is not necessarily limited to that example and can be practiced in any other examples even if not so illustrated, or if not so explicitly described.
An active-by-active programmable device is described. In an example, a system-in-package (SiP)-based programmable device employs a multi-chip module (MCM) package. The MCM package includes a programmable integrated circuit (IC), such as a field programmable gate array (FPGA), and one or more companion integrated circuit (IC) devices disposed on a package substrate. The programmable IC and the companion ICs are disposed side-by-side on the package substrate (e.g., active-by-active). The connection between the programmable IC and each companion IC device is implemented using a high-bandwidth SiP bridge. The SiP bridge can be implemented using a low number of wires, allowing use of the MCM package rather than an expensive interposer. Data to be sent from one device to another is aggregated into a collective bandwidth and delivered over the SiP bridge. Aggregate data received on the SiP bridge is delivered to the destination through a systematic disperse mechanism. In examples described herein, the SiP bridge is implemented using a protocol stack comprising at least physical and data link layers. Higher layers can also be employed, such as a transport layer. The physical layer can be any ultra-short reach (USR) serializer/deserializer (SerDes) technology that meets certain requirements detailed herein. As described further herein, the data link layer is configured to time-multiplex the aggregated data across the available physical channels in a manner that avoids congestion at the destination. Each device can include a system-level interconnect to facilitate aggregation and dispersal of data between application circuits and SiP bridge(s).
In examples, a companion IC die includes some or all of the application input/output (IO) circuits of the SiP device. Thus, the programmable IC can be constructed without such application IO circuits. Removal of application IO circuits from the programmable IC frees die area for use by other circuits. Also, the programmable IC can be manufactured using newer process technology, taking advantage of newer process technology features, while the companion IC having the application IO circuits can be manufactured using older, less expensive process technology. As a result, the overall solution will deliver the same functionality at a lower cost. The programmable IC and companion IC device are connected by a SiP bridge. Each of the programmable IC and the companion IC device include a SiP IO circuit defining an endpoint of the SiP bridge. Each of the programmable IC and the companion IC can include system-level interconnect to provide aggregation and dispersal of data between application circuits and the SiP IO circuits. One example of a system-level interconnect for use in the programmable IC is a system-level interconnect ring (SIR). As described further herein, the SIR allows for minimal augmentation of the programmable fabric and the design tools used to implement circuits for the programmable IC. These and other aspects can be understood with reference to the following figures.
In an example, SiP IO 140 implements multiplexed IO logic having transport logic 114, a data link logic 116, and physical logic 118. In the example shown, the multiplexed IO logic is implemented entirely within SiP IO 140. In other examples, a given layer of the multiplexed IO logic or a portion thereof can be implemented in the application circuits 105. In an example, the transport logic 114 is implemented in the application circuits 105 and the data link logic 116 and the physical logic 118 are implemented in the SiP IO 140. In an example, an arbitration portion of the data link logic 116 is implemented in the application circuits 105 and multiplexing logic for the data link logic 116 is implemented in the SiP IO 140.
The IC die 103 can include similar circuitry as the IC die 101. In the example shown in
The IC die 101 and the IC die 103 are coupled by a SiP bridge 144. The SiP bridge 144 includes the SiP IO 140, SiP IO 142, and signal paths 138. An external interface of the SiP IO 140 is coupled to an external interface of the SiP IO 142 by the signal paths 138. The physical logic 118/120 implement a physical layer of the SiP bridge 144 and support a plurality of physical channels. The data link logic 116/122 implement a data link layer of the SiP bridge 144 and support a plurality of channels on each side of the SiP bridge 144 (referred to as aggregation channels and dispersal channels). The transport logic 114/124 implement a transport layer of the SiP bridge 144. Various transport layers can be employed, including connection-less or connection-based transport layers. The transport layer can provide for packetization, de-packetization, error correction, packet ordering, and the like known in the art.
The application circuits 105 can include a number of outputs coupled to channels of the aggregation circuits 110. The aggregation circuits 110 selectively couple outputs of the application circuits 105 among internal inputs of the SiP IO 140 (referred to as source ports). For example, the SiP IO 140 can include M source ports coupled to M aggregation channels of the aggregation circuits 110, where M is a positive integer. The aggregation circuits 110 can selectively couple outputs of the application circuits 105 to the M source ports of the SiP IO 140 through the M aggregation channels. The SiP IO 140 can include K external outputs driving K physical channels implemented over the signal paths 138, where K is a positive integer. In an example, K is less than M and the SiP IO 140 multiplexes the M source ports among the K external outputs. The SiP IO 142 can include K external inputs receiving from the K physical channels. The SiP IO 142 de-multiplexes the K external inputs among N internal outputs, where N is a positive integer (referred to as destination ports). In an example, N is greater than K. The dispersal circuits 128 selectively couple the N destination ports of the SiP IO 142 among inputs of the application circuits 107 through N dispersal channels.
Likewise, the application circuits 107 can include a number of outputs coupled to aggregation channels of the aggregation circuits 126. The aggregation circuits 126 selectively couple outputs of the application circuits 107 among source ports of the SiP IO 142. For example, the SiP IO 142 can include N′ source ports coupled to N′ aggregation channels of the aggregation circuits 126, where N′ is a positive integer. The aggregation circuits 126 can selectively couple outputs of the application circuits 107 to the N′ source ports of the SiP IO 142 through the N′ aggregation channels. The SiP IO 142 can include K′ external outputs driving K′ physical channels implemented over the signal paths 138, where K′ is a positive integer. In an example, K′ is less than N′ and the SiP IO 142 multiplexes the N′ source ports among the K′ external outputs. The SiP IO 140 can include K′ external inputs receiving from the K′ physical channels. The SiP IO 140 de-multiplexes the K′ external inputs among M′ destination ports, where M′ is a positive integer. In an example, M′ is greater than K′. The dispersal circuits 112 selectively couple the M′ destination ports of the SiP IO 140 among M′ inputs of the application circuits 105 through M′ dispersal channels.
In some examples, K=K′ such that the the SiP bridge 144 supports an equal number of physical channels in each direction between the IC die 101 and the IC die 103. To transmit data from the IC die 101, the aggregation circuits 110 aggregate output of the application circuits 105 into M aggregation channels and the SiP IO 140 multiplexes the M aggregation channels across K physical channels. To receive the data at the IC die 103, the SiP IO 142 de-multiplexes K physical channels into N channels and the dispersal circuits 128 disperse the N dispersal channels to inputs of the application circuits 107. Likewise, to transmit data from the IC die 103, the aggregation circuits 126 aggregate output of the application circuits 107 into N′ aggregation channels and the SiP IO 142 multiplexes the N′ aggregation channels across K′ physical channels. To receive the data at the IC die 101, the SiP IO 140 de-multiplexes K′ channels received from the K′ physical channels into M′ dispersal channels and the dispersal circuits 112 disperse the M′ dispersal channels to inputs of the application circuits 105. In some examples, M=M′ and N=N′ such that aggregation circuits 110 and the dispersal circuits 112 provide a total of 2*M channels to the application circuits 105 and the aggregation circuits 126 and the dispersal circuits 128 provide a total of 2*N channels to the application circuits 107. In some examples, M=M′=N=N′.
In an example, the application circuits 105 and the application circuits 107 exchange packetized data over the SiP bridge 144. The transport logic 114 forms the output of the application circuits 105 into packets each including w bits, where w is a positive integer. The transport logic 114 also de-packetizes data received from the dispersal circuits 112. The transport logic 124 can function similarly to the transport logic 114.
The data link logic 116 organizes the output of the aggregation circuits 110 into a plurality of aggregation channels (e.g., M aggregation channels), where each aggregation channel is w-bits wide for providing a w-bit packet. That is, the aggregation circuits 110 include a multi-channel output. Likewise, the data link logic 116 divides the inputs of the dispersal circuits 112 into a plurality of dispersal channels (e.g., M′ dispersal channels), where each dispersal channel is w-bits wide for receiving a w-bit packet. That is, the dispersal circuits 112 include a multi-channel input. The data link logic 116 maintains a transmit queue for each aggregation channel. The data link logic 116 arbitrates among the transmit queues to select packets to be transmitted over the available physical channels by the physical logic 118. The data link logic 116 also de-multiplexes physical channels into the available dispersal channels of the dispersal circuits 112. The data link logic 122 can function similarly to the data link logic 116.
The physical logic 118 can serialize packets in the available output physical channels for transmission as signals across the signal paths 138. The physical logic 118 can also de-serialize signals received from the signal paths 138 into the available input physical channels. The physical logic 120 can function similarly to the physical logic 118.
In an example, the application circuits 105 can be coupled to the application circuits 107 directly over one or more direct signal paths 146. That is, there can be some communication between the IC die 101 and the IC die 103 that does not traverse the SiP bridge 144. Further, the IC die 101 can include an external interface 147 that can be used to send and receive signals to other external circuits, as well as to receive power and ground. Likewise, the IC die 103 can include an external interface 148 that can be used to send and receive signals to other external circuits, as well as to receive power and ground.
In the example shown in
The contacts 204 comprise the external pins of the MCM 200. The MCM 200 can be mounted to a circuit board. Conductors on the circuit board can be electrically coupled to the IC die 101 and the IC die 103 through the contacts 204 of the package substrate 202. Within the MCM 200, the IC die 101 is electrically coupled to the IC die 103 through the conductive interconnect 208. To implement the IC system 100 described above, the conductive interconnect 208 is patterned to form the signal paths 138 between the IC die 101 and the IC die 103. In an example, the conductive interconnect 208 is also patterned to form the dedicated signal path(s) 146 (if present). The external interface of the SiP IO 140 is coupled the signal paths 138 formed in the conductive interconnect 208 through some of the contacts 214. Likewise, the external interface of the SiP IO 142 is coupled to the signal paths 138 formed in the conductive interconnect 208 through some of the contacts 216.
In an example, at least one of the IC die 101 and the IC die 103 comprises a programmable IC, such as a field programmable gate array (FPGA) or the like.
In some FPGAs, each programmable tile can include at least one programmable interconnect element (“INT”) 311 having connections to input and output terminals 320 of a programmable logic element within the same tile, as shown by examples included at the top of
In an example implementation, a CLB 302 can include a configurable logic element (“CLE”) 312 that can be programmed to implement user logic plus a single programmable interconnect element (“INT”) 311. A BRAM 303 can include a BRAM logic element (“BRL”) 313 in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured example, a BRAM tile has the same height as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 306 can include a DSP logic element (“DSPL”) 314 in addition to an appropriate number of programmable interconnect elements. An IOB 304 can include, for example, two instances of an input/output logic element (“IOL”) 315 in addition to one instance of the programmable interconnect element 311. As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 315 typically are not confined to the area of the input/output logic element 315.
In the pictured example, a horizontal area near the center of the die (shown in
Some FPGAs utilizing the architecture illustrated in
Note that
In an example, the IC die 101 includes a programmable fabric having the FPGA architecture 300. Thus, the IC die 101 further includes the SiP IO 140 and a system-level interconnect 350. The system-level interconnect 350 includes the aggregation circuits 110 and the dispersal circuits 112. In an example, the system-level interconnect 350 comprises a system-level interconnect ring (SIR). Various examples of an SIR are described below. In general, the system-level interconnect 350 provides an interface between the programmable fabric and the SiP IO 140. In an example, the system-level interconnect 350 can be configured in similar fashion to the programmable fabric (e.g., via loading of a configuration bitstream). In other examples, the system-level interconnect 350 can be dynamically programmed during operation of the IC die 101.
In an example, the SiP IO 140 provides an IO interface that circuits configured in the programmable fabric can use to communicate with the IC die 103 over the SiP bridge 144. Circuits can use the application IO of the FPGA architecture 300, such as the IOBs 304, MGTs 301, and any other IOs (e.g., memory IOs, custom IOs, etc), for additional IO to other external circuits. In the example of
In other examples described in more detail below, at least a portion of the application IO can be disposed external to an FPGA, such as in a companion IC of a MCM. For example, the IC die 101 having an FPGA architecture can be constructed with only the SiP IO 140. That is, the IOBs 304 and MGTs 301 are removed from the FPGA architecture 300. All of the application IO can be disposed in the IC die 103. The programmable fabric in the IC die 101 can access the application IO in the IC die 103 using the SiP bridge 144. In some examples, the IC die 101 having an FPGA architecture can include some dedicated IO in addition to the SiP IO, such as configuration IO, JTAG IO, and the like. This dedicated IO is used for programming and/or testing the FPGA and not used as application IO for circuits configured in the programmable fabric.
In the example of
In an example, the programmable fabric 404 does not include any application IO. Rather, application IO 107A in the IC die 103A include the application IO, such as IOBs 304, MGTs 301, and any other IOs 402 (e.g., memory IOs, custom IOs, etc.). The application IO 107A is coupled to the aggregation circuits 126 and the dispersal circuits 128. Circuits configured in the programmable fabric 404 can exchange data with the application IO 107A through the system-level interconnect 305 and the SiP bridge 144. In some examples, the IC die 101A also includes dedicated IO 108, such as configuration IO, JTAG IO, and the like. Some or all of this dedicated IO 108 can also be coupled to the IC die 103A.
The RNs 504 are coupled to wire tracks 502. In the present example, the wire tracks 502 form a ring around the programmable fabric 404. Other interconnect structures can be employed as discussed below. Together, the RNs 504 and the wire tracks 502 form a system-level interconnect that can implement the system-level interconnect 350. In an example, the system-level interconnect supports two channel sets, one for transmitting data from the fabric and one for receiving data to the fabric. For example, the wire tracks 502 can be configured into 256 tracks, allowing for one receive channel set and one transmit channel set each having a width of 128 tracks. Of course, the wire tracks 502 can be configured into more or less than 256 tracks. Each of the RNs 504 is coupled to the wire tracks 502. Each RN 504 includes a router switch that provides a bidirectional interface between the programmable fabric 404 and the system-level interconnect. Circuits configured in the programmable fabric 404 can be coupled to specific wire tracks 502 by programming the RNs 504. The RNs 504 can be programmed using configuration data loaded into the FPGA, or dynamically during operation of the FPGA.
Each RSN 506 comprises a repeatable portion of an RN 504. For example, if the wire tracks 502 are configured into 256 tracks, then an RN 504 can include 8 RSNs 506 each controlling 32 wire tracks. Of course, an RN 504 can include more or less RSNs 506 and the RSNs 506 can control more or less than 32 wire tracks. Each RSN 506 can have the same layout regardless of whether it is part of a RN 504 along a horizontal edge of the programmable fabric 404 or along a vertical edge of the programmable fabric 404. In this manner, the design tools used to design circuits for implementation in the FPGA do not have to distinguish between RNs 504 along a horizontal edge and RNs 504 along a vertical edge. An example structure of an RSN 506 is described below.
The SiP stack 601 includes data link circuits 602 and physical circuits 604. The SiP stack 603 includes data link circuits 608 implementing data link logic and physical circuits 606 implementing physical logic. The physical circuits 604 and 606 form the physical layer. The data link circuits 602 and 608 form the data link layer.
In the example, the SiP stack 601 includes M*w source ports and M*w destination ports. The source ports and destination ports of the data link circuits 602 are coupled to the aggregation circuits 110 and dispersal circuits 112, respectively (e.g., the system-level interconnect 350). The data link circuits 602 include K*w external outputs and K*w external inputs. The external outputs and external inputs of the data link circuits 602 are coupled to the physical circuits 604. The physical circuits 604 are coupled to the physical circuits 606 by the signal paths 138. Depending on the physical layer, the signal paths 138 can include approximately 2*K signal paths. The data link circuits 608 include K*w external inputs and K*w external outputs. The external inputs and outputs of the data link circuits 608 are coupled to the physical circuits 606. The data link circuits 608 include N*w destination ports and N*w source ports. The source ports and destination ports of the data link circuits 608 are coupled to the aggregation circuits 126 and the dispersal circuits 128, respectively.
In operation, the data link circuits 602 multiplexes M aggregation channels of width w into K physical channels of width w. The physical circuits 604 serialize the packets on the K physical channels onto the signal paths 138. The physical circuits 606 de-serialize signals from the signal paths 138 into K physical channels of width w. The data link circuits 608 de-multiplex the K physical channels into N dispersal channels of width w. Operation when transmitting from the data link circuits 608 to the data link circuits 602 (i.e., the reverse direction) is identical.
In operation, the arbitration logic 406 controls the multiplexers 704 so that there are no destination conflicts of packets transmitted across the K physical channels. Each packet being transmitted includes a destination port. The arbitration logic 406 executes an arbitration algorithm to ensure that no two packets being transmitted in parallel across the K physical channels have the same destination port. Example arbitration algorithms are described below.
At step 804, the arbitration logic 406 selects a multiplexer for scheduling. In the example of
At step 806, the arbitration logic 406 identifies a destination port for each transmit queue ready to transmit. In the example of
At step 808, the arbitration logic 406 schedules one or more transmit queues targeting unused destination port(s) for transmission. In an example, scheduling can be done in parallel using variations of a maximum matching algorithm. If a transmit queue includes packet(s) for targeting a used destination port, the arbitration logic 406 holds the transmit queue. At step 810, the arbitration logic 406 marks the identified destination ports as being used. At step 812, the arbitration logic determines whether there are more multiplexers to schedule. If so, the method 800 returns to step 804 and selects the next multiplexer. Otherwise, the method 800 returns to step 802 and marks all destination ports as unused. As such, the arbitration logic 406 executes steps 804-812 for each multiplexer to be scheduled (e.g., each of the multiplexers 704). The arbitration logic 406 executes steps 802 through 812 for multiple scheduling rounds. In this manner, the arbitration logic 406 generates and implements schedules for controlling the multiplexers 704 so that the multiplexers 704 multiplex output of the transmit queues 702 such that packets transmitted in parallel over the physical channels have different destination ports and there is no congestion or conflict at the multiplexers 706.
The method 900 begins at step 902, where the arbitration logic 406 marks all destination ports as being unused. In the example of
At step 904, the arbitration logic 406 updates deficit counts for each transmit queue based on assigned weights. That is, the arbitration logic 406 can assign weight to each transmit queue. Some transmit queues can have more weight (higher priority) than other transmit queues. The deficit counts are used to control how many packets are selected from a given transmit queue at a given time during the scheduling.
At step 906, the arbitration logic 406 selects a multiplexer for scheduling. In the example of
At step 908, the arbitration logic 406 selects a transmit queue (e.g., one of the transmit queues 702). At step 910, the arbitration logic 406 determines whether the destination port targeted by packet(s) in the selected transmit queue is unused. If not, the method 900 returns to step 908 and the arbitration logic 406 selects the next transmit queue. Otherwise, the method 900 proceeds to step 912.
At step 912, the arbitration logic 406 marks the identified destination port as being in use. At step 914, the arbitration logic 406 schedules the selected transmit queue for transmission until empty or until the corresponding deficit count satisfies a threshold. A higher deficit count allows more packets to be selected form a given transmit queue than a lower deficit count.
At step 916, the arbitration logic 406 determines whether the selected transmit queue is empty. If so, the method 900 proceeds to step 918, where the arbitration logic 406 resets the deficit count for the selected transmit queue to an initial value. If the selected transmit queue is not empty, the method 900 proceeds instead to step 920.
At step 920, the arbitration logic 406 determines whether there are more transmit queues to be processed in this iteration. If all the transmit queues have been processed, the method 900 proceeds to step 922. Otherwise, the method 900 returns to step 908 and selects the next transmit queue. At step 922, the arbitration logic 406 determines whether there are more multiplexers to be scheduled. If so, the method 900 returns to step 906 and selects the next multiplexer. Otherwise, the method 900 returns to step 902 and marks all destination ports as unused. As such, the arbitration logic 406 executes steps 906-922 for each multiplexer to be scheduled (e.g., each of the multiplexers 704). The arbitration logic 406 executes steps 908-920 for each transmit queue given a selected multiplexer. The arbitration logic 406 executes steps 902-922 for multiple scheduling rounds. In this manner, the arbitration logic 406 generates and implements schedules for controlling the multiplexers 704 so that the multiplexers 704 multiplex output of the transmit queues 702 such that packets transmitted in parallel over the physical channels have different destination ports and there is no congestion or conflict at the multiplexers 706. The arbitration logic 406 also accounts for queue priority and fairness using a weighting scheme.
At step 1006, the SiP IO 140A transmits the data from the aggregation channels over a smaller number of physical channels of the SiP bridge 144. For example, at step 1008, the data link logic 116 queues data from the aggregation channels into transmit queues. At step 1010, the data link logic 116 multiplexes the transmit queues among the physical channels, while the arbitration logic 406 manages destination conflicts. At step 1012, the arbitration logic 406 can also assign weights to the transmit queues and select packets for transmission based on the weights.
At step 1014, the SiP IO 142 in the IC die 103 receives the data from the SiP bridge 144. At step 1016, the dispersal circuits 128 couple the data from the SiP IO 142 to the application IO 107A through the dispersal channels. At step 1018, the application IO 107A consumes the data and/or transmits the data to external circuit(s).
At step 1106, the SiP IO 142 transmits the data from the aggregation channels over a smaller number of physical channels of the SiP bridge 144. For example, at step 1108, the data link logic 122 queues data from the aggregation channels into transmit queues. At step 1110, the data link logic 122 multiplexes the transmit queues among the physical channels, while manages destination conflicts. At step 1112, the data link logic 122 can also assign weights to the transmit queues and select packets for transmission based on the weights. That is, the data link logic 122 can include arbitration logic similar to the arbitration logic 406.
At step 1114, the SiP IO 140A in the IC die 101 receives the data from the SiP bridge 144. At step 1116, the system-level interconnect 350 couples the data from the SiP IO 140A to the programmable fabric 404 through the dispersal channels. At step 1118, the programmable fabric 404 consumes the data.
As noted above, the IC systems described herein can use any type of USR SerDes technology that meets certain requirements. There are three parameters of the physical layer to consider in order of priority: (1) bandwidth per pin; (2) power per bit; and (3) area. A figure of merit (FoM) can be defined for any physical logic that includes the first two factors: FoM bandwidth per pin)/(power per bit). In an example, the physical logic described herein can include a FoM greater than or equal to 20 (Gb/s)(pJ/bit).
The system-level interconnect 350 includes a horizontal edge of the programmable fabric 404 (horizontal fabric edge 1308) and the wire tracks 502. The horizontal fabric edge 1308 includes interconnect elements 311A and RSNs 506. Each RSN 506 occupies a region equivalent to a pair of CLEs 312L and 312M. Each RSN 506 includes switch circuitry coupled to a portion of the wire tracks 502. In particular, each RSN 506 includes clockwise (CW) links 1304 to adjacent circuitry through the wire tracks 502. Each RSN 506 also includes counter-clockwise (CCW) links 1302 to adjacent circuitry through the wire tracks 502. The adjacent circuitry can be an RSN in another RN or the SiP IO 140A depending on the position of the RSN 506. The interconnect elements 311A are configured to couple the RSNs 506 to the programmable interconnect of the programmable fabric 404.
In an example, the wire tracks 502 include 256 tracks as described in the examples above. An RSN 506 is coupled to a portion of the wire tracks 502 in both the CW and CCW directions. For example, an RSN 506 can be coupled to 32 of the wire tracks 502 in each of the CW and CCW directions (e.g., 64 total links). A plurality of the RSNs 506 combine to form an RN 504, which is coupled to all of the wire tracks 502. For example, if the RSNs 506 control 32 wire tracks, then a RN 504 includes 8 RSNs 506.
One or more sets of RSNs 506 may be integrated into the programmable fabric 404 as shown in
The system-level interconnect 350 includes a vertical edge of the programmable fabric 404 (vertical fabric edge 1408) and the wire tracks 502. The vertical fabric edge 1408 includes interconnect elements 311A and RSNs 506. Each RSN 506 occupies a region equivalent to a pair of CLEs 312L and 312M. Each RSN 506 includes switch circuitry coupled to a portion of the wire tracks 502. In particular, each RSN 506 includes CW links 1404 to adjacent circuitry through the wire tracks 502. Each RSN 506 also includes CCW links 1402 to adjacent circuitry through the wire tracks 502. The adjacent circuitry can be an RSN in another RN or the SiP IO 140A depending on the position of the RSN 506. The interconnect elements 311A are configured to couple the RSNs 506 to the programmable interconnect of the programmable fabric 404.
In an example, the wire tracks 502 may include 256 tracks as described in the examples above. An RSN 506 is coupled to a portion of the wire tracks 502 in both the CW and CCW directions. For example, an RSN 506 can be coupled to 32 of the wire tracks 502 in each of the CW and CCW directions (e.g., 64 total links). A plurality of the RSNs 506 combine to form an RN 504, which is coupled to all of the wire tracks 502. For example, if the RSNs 506 control 32 wire tracks, then a RN 504 includes 8 RSNs 506.
One or more sets of RSNs 506 may be integrated into the programmable fabric 404 as shown in
The buffers 1512 include a 16-bit input coupled to the interconnect element 311A. An output of the buffers 1512 is coupled to an input of the multiplexer 1520 and an input of the multiplexer 1516. Another input of the multiplexer 1520 is coupled to the output of the buffers 1504. Another input of the multiplexer 1516 is coupled to the output of the buffers 1510. An output of the multiplexer 1520 is coupled to an input of the flip-flops 1508. An output of the multiplexer 1516 is coupled to an input of the flip-flops 1506. The flip-flops 1508 include a 16-bit output coupled to the left-side RSN. The flip-flops 1506 include a 16-bit output coupled to the right-side RSN or the SiP IO 140A. Control inputs of the multiplexers 1516, 1518, and 1520 are coupled to outputs of the arbiter 1502. Inputs of the arbiter 1502 are coupled to the outputs of the buffers 1504, 1510, and 1512.
In operation, the RSN 506 buffers input from the programmable fabric 404 through the interconnect element 311A using the buffers 1512. The arbiter 1502 routes the buffered input either to the flip-flops 1508 or the flip-flops 1506. The flip-flops 1506 are coupled to the wire tracks 502 through CW links. The flip-flops 1508 are coupled to the wire tracks 502 through CCW links. The flip-flops 1506 and 1508 register the data for transmission to adjacent circuitry (e.g., either an adjacent RSN or the SiP IO 140A). The RSN 506 also buffers input from adjacent circuitry (e.g., an adjacent RSN or the SiP IO 140A) using the buffers 1504 and 1510. The buffers 1504 are coupled to the wire tracks 502 through CCW links. The buffers 1510 are coupled to the wire tracks 502 through CW links. The arbiter 1502 routes the buffered input from either the buffers 1504 or the buffers 1510 to the flip-flops 1514. The flip-flops 1514 provide registered output to the programmable fabric 404 through the interconnect element 311A. Thus, the RSN 506 implements a 16-bit switch. In other examples, the RSN 504 can implement a switch having a width less than or greater than 16 bits. The layout of the RSN 506 shown in
As shown in
As shown in
As shown in
As shown in
As shown in
At step 1706, the SiP IO 140A transmits the data from the bus channels over a smaller number of physical channels of the SiP bridge 144. For example, at step 1708, the data link logic 116 queues data from the bus channels into transmit queues. At step 1710, the data link logic 116 multiplexes the transmit queues among the physical channels, while the arbitration logic 406 manages destination conflicts. At step 1712, the arbitration logic 406 can also assign weights to the transmit queues and select packets for transmission based on the weights.
At step 1714, the SiP IO 142 in the IC die 103 receives the data from the SiP bridge 144. At step 1716, the dispersal circuits 128 couple the data from the SiP IO 142 to the application IO 107A through the dispersal channels. At step 1718, the application IO 107A transmits the data to external circuit(s).
While the foregoing is directed to specific examples, other and further examples may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims
1. An integrated circuit (IC) system, comprising:
- a package substrate having a programmable integrated circuit (IC) die and a companion IC die mounted thereon, the programmable IC die including a programmable fabric and the companion IC die including application circuitry;
- a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC die, a second SiP IO circuit disposed in the companion IC die, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit;
- first aggregation and first dispersal circuits in the programmable IC die coupled between the programmable fabric and the first SiP IO circuit; and
- second aggregation and second dispersal circuits in the companion IC die coupled between the application IO circuitry and the second SiP IO circuit;
- wherein the first and second SiP IO circuits are configured to multiplex multi-channel output of the first and second aggregation circuits, respectively, onto a first plurality of physical channels implemented over the conductive interconnect; and de-multiplex input from a second plurality of physical channels implemented over the conductive interconnect onto multi-channel input of the first and second dispersal circuits, respectively.
2. The IC system of claim 1, wherein the first aggregation and the first dispersal circuits comprise a system level system-level interconnect coupled between a programmable interconnect of the programmable fabric and the first SiP IO circuit.
3. The IC system of claim 2, wherein the system-level interconnect comprises a network-on-chip (NoC).
4. The IC system of claim 1, wherein the programmable IC die comprises a direct connection to the companion IC die separate from the SiP bridge.
5. The IC system of claim 1, wherein the programmable IC die includes arbitration logic, and wherein the first SiP IO circuit comprises a data link circuit and a transceiver circuit, where:
- an internal interface of the data link circuit is coupled to the first aggregation and the first dispersal circuits;
- an external interface of the data link circuit is coupled to an internal interface of the transceiver circuit;
- an external interface of the transceiver circuit is coupled to the conductive interconnect; and
- a control interface of the data link circuit is coupled to the arbitration logic.
6. The IC system of claim 5, wherein the arbitration logic is implemented within the programmable fabric of the programmable IC die.
7. The IC system of claim 1, wherein the programmable IC die includes transport logic configured to packetize data transmitted to the first aggregation circuit and de-packetize data received from the first dispersal circuit.
8. The IC system of claim 7, wherein the transport logic is implemented within the programmable fabric of the programmable IC die.
9. A programmable integrated circuit (IC), comprising:
- a system-in-package (SiP) input/output (IO) circuit configured to be coupled to a companion IC through external conductive interconnect;
- a programmable fabric without at least a portion of application circuitry; and
- aggregation and dispersal circuits coupled between the programmable fabric and the SiP IO circuit;
- wherein the aggregation and the dispersal circuits comprise a system-level interconnect coupled between a programmable interconnect of the programmable fabric and the SiP IO circuit; and
- wherein the system-level interconnect comprises a network-on-chip (NoC).
10. The programmable IC of claim 9, wherein the programmable fabric is directly connected to the companion IC separate from the SiP IO circuit.
11. The programmable IC of claim 9, wherein the programmable fabric is configured to implement arbitration logic, and wherein the SiP IO circuit comprises a data link circuit and a transceiver circuit, where:
- an internal interface of the data link circuit is coupled to the aggregation and the dispersal circuits;
- an external interface of the data link circuit is coupled to an internal interface of the transceiver circuit;
- an external interface of the transceiver circuit is coupled to the external conductive interconnect; and
- a control interface of the data link circuit is coupled to the arbitration logic.
12. The programmable IC of claim 9, wherein the programmable fabric is configured to implement transport logic that packetizes data transmitted to the aggregation circuit and de-packetizes data received from the dispersal circuit.
13. A method of transmitting data from a programmable integrated circuit (IC) in an IC system, the method comprising:
- coupling the data to a first system-in-package (SiP) IO circuit through a plurality of channels of an aggregation circuit in the programmable IC;
- transmitting the data from the plurality of channels by multiplexing the data over a smaller number of physical channels implemented over a conductive interconnect between the programmable IC and a companion IC;
- receiving the data from the plurality of physical channels at a second SiP IO circuit in the companion IC; and
- coupling the data from the second SiP IO circuit to application circuitry in the companion IC by demultiplexing the data through a plurality of channels of a dispersal circuit in the companion IC.
14. The method of claim 13, wherein the data is divided into packets.
15. The method of claim 14, wherein the second SiP IO circuit includes a plurality of internal output ports coupled to the respective plurality of channels of the dispersal circuit, and wherein the packets each have a destination port selected from one of the plurality of internal output ports.
16. The method of claim 15, wherein the step of transmitting comprises:
- queuing the data from the plurality of channels of the aggregation circuit in a respective plurality of transmit queues; and
- multiplexing output of the respective plurality of transmit queues among the plurality of physical channels such that packets transmitted in parallel over the plurality of physical channels have different destination ports.
17. The method of claim 16, wherein the step of multiplexing further comprises:
- assigning weights to each of the respective plurality of transmit queues; and
- selecting packets from the respective plurality of transmit queues for transmission over the plurality of physical channels based on the weights.
18. An integrated circuit (IC) system, comprising:
- a package substrate having a programmable integrated circuit (IC) die and a companion IC die mounted thereon, the programmable IC die including a programmable fabric and the companion IC die including application circuitry;
- a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC die, a second SiP IO circuit disposed in the companion IC die, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit;
- first aggregation and first dispersal circuits in the programmable IC die coupled between the programmable fabric and the first SiP IO circuit; and
- second aggregation and second dispersal circuits in the companion IC die coupled between the application IO circuitry and the second SiP IO circuit;
- wherein the programmable IC die comprises a direct connection to the companion IC die separate from the SiP bridge.
19. An integrated circuit (IC) system, comprising:
- a package substrate having a programmable integrated circuit (IC) die and a companion IC die mounted thereon, the programmable IC die including a programmable fabric and the companion IC die including application circuitry;
- a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC die, a second SiP IO circuit disposed in the companion IC die, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit;
- first aggregation and first dispersal circuits in the programmable IC die coupled between the programmable fabric and the first SiP IO circuit; and
- second aggregation and second dispersal circuits in the companion IC die coupled between the application IO circuitry and the second SiP IO circuit;
- wherein the programmable IC die includes arbitration logic, and wherein the first SiP IO circuit comprises a data link circuit and a transceiver circuit, where:
- an internal interface of the data link circuit is coupled to the first aggregation and the first dispersal circuits;
- an external interface of the data link circuit is coupled to an internal interface of the transceiver circuit;
- an external interface of the transceiver circuit is coupled to the conductive interconnect; and
- a control interface of the data link circuit is coupled to the arbitration logic.
20. An integrated circuit (IC) system, comprising:
- a package substrate having a programmable integrated circuit (IC) die and a companion IC die mounted thereon, the programmable IC die including a programmable fabric and the companion IC die including application input/output (IO) circuitry, wherein the programmable IC die does not include application IO circuitry;
- a system-in-package (SiP) bridge including a first SiP IO circuit disposed in the programmable IC die, a second SiP IO circuit disposed in the companion IC die, and conductive interconnect on the package substrate electrically coupling the first SiP IO circuit and the second SiP IO circuit;
- first aggregation and first dispersal circuits in the programmable IC die coupled between the programmable fabric and the first SiP IO circuit; and
- second aggregation and second dispersal circuits in the companion IC die coupled between the application IO circuitry and the second SiP IO circuit;
- wherein the first and second SiP IO circuits are configured to multiplex multi-channel output of the first and second aggregation circuits, respectively, onto a plurality of physical channels implemented over the conductive interconnect; and de-multiplex input from the plurality of physical channels implemented over the conductive interconnect onto multi-channel input of the first and second dispersal circuits, respectively.
21. The IC system of claim 20, wherein circuits in the programmable fabric are configured to use the first and second SiP IO circuits to communicate with the application IO circuitry in the companion IC die.
22. The IC system of claim 21, wherein the programmable IC die further comprises dedicated IO circuitry configured for at least one of programming or testing the programmable IC die, wherein the circuits in the programmable fabric that are configured to use the application IO circuitry in the companion IC die do not use the dedicated IO circuitry.
5585282 | December 17, 1996 | Wood et al. |
5696031 | December 9, 1997 | Wark |
5729894 | March 24, 1998 | Rostoker et al. |
5867484 | February 2, 1999 | Shaunfield |
6071754 | June 6, 2000 | Wark |
6140149 | October 31, 2000 | Wark |
6399416 | June 4, 2002 | Wark |
6500760 | December 31, 2002 | Peterson et al. |
6501165 | December 31, 2002 | Farnworth et al. |
6570404 | May 27, 2003 | Norman et al. |
6674971 | January 6, 2004 | Boggess |
7017137 | March 21, 2006 | Wadland |
7064579 | June 20, 2006 | Madurawe |
7068072 | June 27, 2006 | New |
7075175 | July 11, 2006 | Kazi et al. |
7091598 | August 15, 2006 | Fujita et al. |
7132311 | November 7, 2006 | Akiba et al. |
7301234 | November 27, 2007 | Lee |
7454502 | November 18, 2008 | Grimm |
7501696 | March 10, 2009 | Koyama et al. |
7609561 | October 27, 2009 | Cornwell et al. |
7701251 | April 20, 2010 | Rahman et al. |
7701252 | April 20, 2010 | Chow et al. |
7919845 | April 5, 2011 | Karp et al. |
7943436 | May 17, 2011 | McElvain |
8443323 | May 14, 2013 | McGowan |
8760328 | June 24, 2014 | Koay et al. |
8901748 | December 2, 2014 | Manusharow et al. |
9443561 | September 13, 2016 | Roberts |
9607948 | March 28, 2017 | Karp et al. |
10002100 | June 19, 2018 | Kaviani et al. |
10042806 | August 7, 2018 | Kaviani et al. |
20020066956 | June 6, 2002 | Tagauchi |
20040178819 | September 16, 2004 | New |
20050007147 | January 13, 2005 | Young |
20050089027 | April 28, 2005 | Colton |
20060237835 | October 26, 2006 | Fujita et al. |
20070001168 | January 4, 2007 | Kirby et al. |
20070195716 | August 23, 2007 | Richter |
20080297192 | December 4, 2008 | Van Wageningen et al. |
20090039492 | February 12, 2009 | Kang et al. |
20090201088 | August 13, 2009 | Sivadas et al. |
20090210480 | August 20, 2009 | Sivasubramaniam |
20110135301 | June 9, 2011 | Myslinski |
20110206381 | August 25, 2011 | Ji et al. |
20110215472 | September 8, 2011 | Chandrasekaran |
20130002155 | January 3, 2013 | Hu et al. |
20130195002 | August 1, 2013 | Walker |
20130214432 | August 22, 2013 | Wu |
20130336039 | December 19, 2013 | Frans |
20140049932 | February 20, 2014 | Camarota |
20150188847 | July 2, 2015 | Chopra et al. |
20150215669 | July 30, 2015 | Cholas et al. |
20150312659 | October 29, 2015 | Mehrvar |
20150358084 | December 10, 2015 | Lesea et al. |
20150358085 | December 10, 2015 | Trimberger et al. |
20200065282 | February 27, 2020 | Ngo |
102113116 | June 2011 | CN |
102543770 | July 2012 | CN |
103716919 | April 2014 | CN |
203800042 | August 2014 | CN |
2007502014 | February 2007 | JP |
2007089150 | April 2007 | JP |
2007534052 | November 2007 | JP |
2009032857 | February 2009 | JP |
2013535811 | September 2013 | JP |
20140113467 | September 2014 | KR |
- Shokrollahi, Amin et al., “A Pin-Efficient 20.83 Gb/s/wire 0.94pJ/bit Forwarded Clock CNRZ-5-Coded SerDes up to 12mm for MCM Packages in 28nm CMOS,” Proc. of the 2016 IEEE International Solid-State Circuits Conference, Feb. 2, 2016, pp. 1-3, IEEE, Piscataway, New Jersey, USA.
- Poulton, John W. et al., “A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS For Advanced Packaging Applications,” IEEE Journal of Solid-State Circuits, Dec. 2013, pp. 3206-3218, vol. 48, No. 12, IEEE, Piscataway, New Jersey, USA.
- Specification and drawings for U.S. Appl. No. 14/674,321, filed Mar. 31, 2015, Karp et al.
- Gupta, Subhash et al., “Techniques for Producing 3D ICs with High-Density Interconnect,” Sep. 29, 2004, Proc. of the 21st International Multilevel Interconnection Conference, IEEE, Piscataway, New Jersey, USA.
- Tezzaron Semiconductor, “FaStack Creates 3D Integrated Circuits,” copyright 2002, revised May 23, 2007, pp. 1-3, Tezzaron Semiconductor, Naperville, Illinois, USA.
- Tezzaron Semiconductor, “FaStack Stacking Technology,” Feb. 2004, Rev. 1.4, pp. 1-4, Tezzaron Semiconductor, Naperville, Illinois, USA.
Type: Grant
Filed: Jun 18, 2020
Date of Patent: Aug 9, 2022
Assignee: XILINX, INC. (San Jose, CA)
Inventors: Alireza S. Kaviani (San Jose, CA), Pongstorn Maidee (San Jose, CA), Ivo Bolsens (San Jose, CA)
Primary Examiner: William H. Wood
Application Number: 16/891,972
International Classification: G06F 13/36 (20060101); G06F 13/40 (20060101); G06F 13/362 (20060101); G06F 13/00 (20060101);