BUS SYSTEMS AND METHODS FOR CONTROLLING DATA FLOW IN A FIELD OF PROCESSING ELEMENTS

Info

Publication number: 20120151113
Type: Application
Filed: Dec 13, 2011
Publication Date: Jun 14, 2012
Inventors: Martin VORBACH (Munich), Volker Baumgarte (Munich), Gerd Ehlers (Grassbrunn)
Application Number: 13/324,048

Abstract

A bus system for a configurable architecture and methods therefor are provided in which optimization of the configuration efficiency and reconfiguration efficiency are taken into account separately. A system and method may include controlling data transmission by: transmitting, by a first hardware element and to a second hardware element, a data packet conditional upon and/or responsive to the second hardware element's assignment of a signal to a connecting bus via which the data packet is transmitted, where the signal indicates that no incoming data packet can be lost. A system and method may include controlling data transmission by: transmitting, by a first hardware element and to a second hardware element, a first data packet and subsequently a second data packet; and receiving, by the first hardware element and from the second hardware element, an acknowledgement of the first data packet subsequent to the transmittal of the second data packet.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a divisional of and claims priority to U.S. patent application Ser. No. 10/504,684, filed on Jul. 14, 2006, which is the National Stage of International Patent Application Serial No. PCT/DE2003/000489, filed on Feb. 18, 2003, the entire contents of each of which are expressly incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to methods and embodiments of bus systems for configurable architectures. More specifically, embodiments of the present invention relate to configuration optimization and reconfiguration efficiency.

BACKGROUND INFORMATION

A reconfigurable architecture is understood to refer to modules (VPUs) having a configurable function and/or interconnection, in particular integrated modules having a plurality of arithmetic and/or logic and/or analog and/or storing and/or interconnecting modules (hereinafter referred to as PAEs) arranged in one or more dimensions and/or communicative peripheral modules (IO) interconnected directly or via one or more bus systems. PAEs may be of any embodiment or mixture and arranged in any hierarchy. This arrangement is referred to below as a PAE array or PA.

Generic modules of this type include systolic arrays, neural networks, multiprocessor systems, processors having multiple arithmetic units and/or logic cells, interconnecting and network modules such as crossbar switches, as well as known modules of the types FPGA, DPGA, XPUTER, etc. In this context, reference is made in particular to the following patents and applications by the present applicant: P 44 16 881.0-53, DE 197 81 412.3, DE 197 81 483.2, DE 196 54 846.2-53, DE 196 54 593.5-53, DE 197 04 044.6-53, DE 198 80 129.7, DE 198 61 088.2-53, DE 199 80 312.9, PCT/DE 00/08169, DE 100 36 627.9-33, DE 100 28 397.7, DE 101 10 530.4, DE 101 11 014.6, PCT/EP 00/10516, EP 01 102 674.7, PCT/DE97/02949, PCT/DE97/02998, PCT/DE97/02999, PCT/DE98/00334, PCT/DE99/00504, PCT/DE99/00505, PCT/EP02/10065, PCT/DE00/01869, PCT/DE02/03278, PCT/EP02/02403, PCT/DE03/00152, DE 102 06 857.7, DE 102 40 000.8, PCT/EP02/02402, DE 02 027 277.9, and EP 01 129 923.7, which are hereby incorporated by reference herein in their entireties.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the two processing elements, a bus, and their IDs.

FIG. 2 shows a bus segment with double switches for controlling data flow between bus segments, according to an example embodiment of the present invention.

FIGS. 3a-3c show a bus system in various states of configuration, and the use of switches for connecting an input of a processing element, according to an example embodiment of the present invention.

FIGS. 4a-4c show a bus system in various states of configuration, and the use of switches and RdyHold stages for connecting an output of a processing element, according to an example embodiment of the present invention.

FIGS. 5a-5c show examples of processing elements with differently configured interconnections, and signal propagation in the case of branching or loops.

FIG. 6a shows a conventional bus design.

FIG. 6b shows a bus design according to an example embodiment of the present invention.

FIG. 7 shows different types of connections between busses using either one switch or two switches and using configuration bits to determine the states of the switches, according to example embodiments of the present invention.

FIGS. 8a and 8b illustrate how to respond to a SyncReconfig before a configuration is not yet completely configured, according to example embodiments of the present invention.

FIG. 9 illustrates an architecture implementing a RDY/ACK protocol according to an example embodiment of the present invention.

FIG. 10 illustrates a modified architecture implementing a RDY/ACK protocol according to an example embodiment of the present invention.

FIG. 11 illustrates an architecture including a double receiver input register, implementing a transmitter/receiver protocol according to an example embodiment of the present invention.

FIG. 12 illustrates a modified architecture implementing a protocol between a transmitter and receiver, where all modules have registers at the output, according to an example embodiment of the present invention.

FIG. 13 illustrates an architecture implementing a RDY-ABLE protocol according to an example embodiment of the present invention.

FIG. 14 shows a bus signal between a transmitter and a receiver using credit system timing.

FIG. 15 shows a bus signal between a transmitter and a receiver using a RDY protocol.

FIG. 16 shows a bus signal where a pulsed RDY-ABLE protocol is used.

FIG. 17 shows hardware for receiving and sending data using a RDY/ABLE protocol according to an example embodiment of the present invention.

FIG. 18 illustrates an example interface arrangement of AMBA for a control manager (CM) interface of a unit having an XPP core according to an example embodiment of the present invention.

FIG. 19 shows an internal structure of a receiver part in an external interface for a 16-bit output port of the CM according to an example embodiment of the present invention.

FIG. 20 shows an internal structure of a transmitter part of an external module that establishes an interface connection with the 16-bit input port of the configuration manager according to an example embodiment of the present invention.

DETAILED DESCRIPTION

The architecture indicated above is used as an example for illustration and is referred to below as a VPU. This architecture has any number of arithmetic or logic cells (including memories) and/or memory cells and/or interconnection cells and/or communicative/peripheral (IO) cells (PAEs), which may be arranged to form a one-dimensional or multidimensional matrix (PA), which may have different cells of any configuration. Bus systems are also understood to be cells. The matrix as a whole or parts thereof are assigned a configuration unit (CT), which influences the interconnection and function of the PA. Improvements are still possible with such architectures, e.g., with regard to the procedure and/or speed of reconfiguration.

1. Structure of Bus Systems

Conventional implementation of configuration requires synchronization between the objects. Objects are understood to refer to all data processing modules (PAEs) and, inasmuch as necessary, also the data transferring modules such as bus systems. This synchronization is implemented centrally, e.g., via a FILMO (see PCT/DE97/02998, PCT/DE97/02999, PCT/DE99/00504, PCT/DE99/00505, and PCT/EP01/06703). Therefore, at least as many cycles elapse between the end of an old configuration (reconfig trigger; see PCT/DE98/00334) and the beginning of a new configuration (object again enters the “configured” state) as would correspond to the length of the pipelined CM bus (forward and return; see PCT/EP01/06703).

Two methods for accelerating this procedure, according to example embodiments of the present invention, provide, respectively, that:

- a) the required sequence is ensured by additional logic in the objects, e.g., management of IDs; and
- b) the objects are modified so that it is no longer necessary to take the sequence into account and instead the proper interconnection is ensured by the architecture of the objects.

For the following considerations, the modules present in a typical reconfigurable architecture are divided into two groups: buses and object. The buses group includes the connecting line between two segments. It is represented by the segment switch at one end. The object group includes all the objects which have a connection to a bus and/or communicate with its environment, i.e., any PAE (e.g., memory, ALUs), IO, etc.

Typically there are dependencies mainly among all directly adjacent objects, including specifically bus to bus, object to bus, and object to object. With respect to bus to bus, a bus is represented by the segment switch at the end of a bus. With respect to object to bus, the object is to be selected freely from FREG, BREG, ALU and RAM. Everything that has a connection is likewise counted as an object in this sense. With respect to object to object, these are not usually directly adjacent and there is normally a bus in between. There is then no dependence. In the case of a direct connection, the connection behaves according to “bus to bus” and/or “object to bus” depending on the embodiment.

1.1 Bus-to-Bus Dependence

In the related art, longer buses are configured from back to front, for example. An example of the bus design described below is illustrated in FIG. 6a. The last bus segment (0606a) is configured with an open bus switch (0607) while all others are configured with a closed bus switch. The sequence must be preserved to prevent data from running from a bus that is closer to the front to a bus that is closer to the rear which still belongs to another configuration.

1.2 Bus-to-Object Dependence

According to the related art, an object (e.g., 0601, 0602, 0603) may not be configured until it is ascertained that the buses (0606a, 0606b) used by the object have already been configured. This dependence also exists to ensure that no data is running into a foreign configuration (PAE output) and/or is taken from a foreign configuration (PAE input).

In summary, it may be concluded that there is always a dependence when an object establishes, has established, and/or wishes to establish a connection to another object. This takes place by way of the connection mask (0608) which controls the connection of the object inputs and/or outputs onto the buses (e.g., via multiplexers, transmission gates and the like; see also PCT/EP02/02403, FIGS. 5 and 7c) and/or closed bus switches (0607) which permit the transfer of information via a bus (e.g., from one segment (0606a[1]) to another segment (0606a[2]). In other words, this connection mask indicates which horizontal bus structure is connected to which vertical bus structure and where this occurs; the fact that a “lane change” to a horizontal bus structure, for example, is also possible should be mentioned for the sake of thoroughness. The connection must not be established until it is ascertained that the object to which the connection is to be established already belongs to the same configuration, i.e., has already been configured accordingly.

2. Control Over ID Management

A first approach, according to an example embodiment of the present invention, is to store the ID or array ID currently being used by the object in each object (see PCT/DE99/00504 and PCT/DE99/00505). Therefore, information regarding which task and/or configuration the particular object is being assigned to at the moment is stored. As soon as a connection between two objects is configured (e.g., between a PAE output and a bus), a check is performed in advance to determine whether both objects have the same ID/array ID. If this is not the case, the connection must not be established. Thus, a connection is activated and/or allowed, depending on a comparison of identifying information.

Although this method is basically comparatively trivial, it requires a great hardware complexity, because, for each possible connection, registers are required for storing the IDs/array IDs and comparators are required for comparing the IDs/array IDs of the two objects to be connected.

FIG. 1 shows the two PAEs (0101, 0102) together with their IDs and a bus (0103) with its ID. Each PAE/bus connection is checked via the comparators (0104, 0105). The figure is used only to illustrate the basic principle without being restrictive. If all resources (inputs/outputs of the PAEs, buses) are taken into account, there is a considerable increase in complexity and the associated hardware expenditure. A method according to the present invention which is implemented much more favorably from a technical standpoint and is therefore preferred is discussed in the following sections.

3. Control Over the Interconnection Structure

FIG. 2 shows a bus segment needed by configurations A and B. However, it is still occupied by configuration A, as shown here. Configuration B may already occupy the two neighboring bus segments independently thereof. According to the present invention, through the new double bus switches (0201 and 0202, corresponding to 0607 and, according to FIG. 6b, 0609), the possibility may be ruled out that data from configuration B will interfere with the data flow of configuration A. Likewise, no data runs from configuration A to B. In the case of configuration B, it is assumed that configuration A has been correctly implemented and that the bus switch at the output is open.

As soon as configuration A is concluded, the bus thus released is occupied by configuration B and configuration B begins to work.

In other words, one basic principle of the method is that each element involved in a data transmission connects itself automatically to the corresponding data source and/or the data transmitter, i.e., it has the control itself of which data transmitter/receiver it is to be connected to according to the configuration.

Bus to PAE Input

FIG. 3 shows a PAE input (0301) which is to be connected to the two lower buses of the three buses shown here. The vertical switches correspond to a simple connection switch of the connection mask (0608) for connection to the bus and are managed by the PAE (0302), and in addition, the horizontal switches (0303, corresponding to 0610) are also configured via the bus to ensure a correct connection.

The middle bus in FIG. 3a is still occupied by another configuration. Nevertheless, the object may be configured completely using the PAE input. Data from the middle bus cannot run unintentionally into the object because this is prevented by the configuration of the bus (switch 0303).

In FIG. 3b, the old configuration has been terminated and replaced by the new configuration. Now both buses are available. To determine which buses are in fact connected, only the vertical switches (0302) are used.

Finally, the upper bus in FIG. 3c is occupied by a third configuration, which would also like to use the PAE input shown. Therefore, the bus is configured so that data may be withdrawn at this point. However, this has no effect on the object because the PAE configuration does not provide any connection at this point. The connection is thus not established until the configuration of the PAE input changes.

Bus-PAE Output

This is a connection in which the use of two separate switches is particularly preferred. It may be preferable in the (two) other cases to implement the functionality with one switch which is controlled by two config bits which are interlinked by Boolean logic, preferably by an AND link, to determine the switch state. FIG. 4 shows a PAE output which is to be connected to the two lower buses of the three buses illustrated here. The object is configured independently of the availability of the buses, the switches on the left in the figure corresponding to the connection mask.

The middle bus (0401) in FIG. 4a is still occupied by another configuration. Now a data packet may be sent from the output register to the connection. It is stored in the connected RdyHold (see PCT/EP02/02403) stages. The packet may not be transmitted through the opened switch of the middle bus and thus also may not be acknowledged, i.e., the transmitter does not receive an acknowledgment of receipt. Thus, the object may not transmit any further data packets with the usual protocols.

Now in FIG. 4b the middle bus has been reconfigured, i.e., the switch closed, so that data may again be transmitted here. A packet that has possibly already been stored is now on the bus; otherwise everything functions like before.

In FIG. 4c the top bus (0402) is requested by a third configuration. The switch on the bus side behind the RM remains open accordingly, because data transfer is to be prevented on the bus side. Here again, everything otherwise behaves like before.

Result

The reconfiguration performance may be increased substantially with relatively simple hardware. In particular, it is thus even more possible to preload multiple complete configurations into the objects because the objects may then be configured individually per object and independently according to the prevailing data processing status of each without any problems being expected.

After arrival of the reconfiguration signal requesting reconfiguration, each object until it is configured again needs locally only as many cycles as configuration words are necessary when transmission of configuration words in cycles is assumed. The reconfiguration time may be pushed further by using a second register set, approximately toward zero cycles, when configurations are predeposited in the second register set.

In an optimized implementation that is preferred according to the present invention, the additional hardware complexity for buses and PAE inputs may be limited to one additional configuration bit and one AND gate per bus switch and per number of buses H number of PAE inputs. This is depicted in FIG. 7.

FIG. 7a shows a left-hand bus (0606a[1]) connected to a right-hand bus (0606a[2]) via the bus switch. A configuration switch is assigned to each bus switch, indicating whether the switch is configured as being open or closed (c[1] for the left-hand bus and c[2] for the right-hand bus). In FIG. 7b the same function is implemented by a single switch instead of two switches. The two configuration bits c[1] and c[2] are logically linked together by an AND gate (&) so that the single switch is closed only when both configuration bits in this example are logic b′1. Alternatively, an implementation via an OR gate is appropriate when a logic b′0 is to display a closed switch.

The PAE outputs may optionally require slightly more complexity, depending on the implementation, if an additional switch is considered to be necessary for each. In this connection, it should be pointed out that although it is possible to provide the connection to and/or between all objects according to the present invention, this is by no means obligatory. Instead, it is possible to implement embodiments of the present invention only in some objects.

FIG. 6b shows as an example a design of an object and a bus according to the present invention. The basic design corresponds to the related art according to FIG. 6a and/or according to PCT/EP02/02403, FIGS. 5 and 7c. Therefore, only the elements in FIG. 6b that are novel in comparison with the related art will be described here. The switches on bus ends 0609 are inserted according to an example embodiment of the present invention, so the buses are completely separable by switches 0607 and 0609. Switches (0610) at the inputs and outputs of the objects (PAEs), regulating the correct connections to the buses, are also novel.

A basic principle now is that each object and/or each bus independently regulates, i.e., determines, which connections are to be established and/or remain in effect at the moment. It should be pointed out here that this determination is performed by the individual object and/or bus depending on the configuration, i.e., it is by no means arbitrary. Management of the connections is thus more or less delegated to the objects involved. Each bus may regulate which other buses it will be connected to via switches 0607 and 0609 according to the configuration. No bus may now be connected to another (e.g., via 0607) without the other bus allowing this through a corresponding switch setting of its bus switches (e.g., 0609).

It should be pointed out explicitly that switch 0607 according to the related art could also be situated at the output of a bus and switch 0609 is added at the input of the buss accordingly.

Switches 0610 are preferably also double switches, one switch being controlled by the PAE object and the other switch being controlled by the particular bus system 0606a and/or 0606b. It should be pointed out in particular that one switch is merely indicated with dashed lines. This is the switch controlled by bus 0606a and/or 0606b and it may be implemented “virtually” by the setting of the connection mask (0608).

5. Reconfiguration Control

Control of the reconfiguration is triggered in the VPU technology by signals (Reconfig) which are usually propagated with the data packets and/or trigger packets over the bus systems and indicate that a certain resource may or should be reconfigured and, if necessary, the new configuration is selected at the same time (see PCT/DE98/00334 and PCT/DE00/01869).

If a reconfigurable module is to be only partially reconfigured, then Reconfig must be interrupted at certain locations according to the algorithm. This interruption, which prevents forwarding of Reconfig, is referred to as ReconfigBlock.

ReconfigBlocks are usually introduced at the boundary of one configuration with the next to separate them from one another.

Different strategies for sending Reconfig signals are selected as requested by the algorithm.

Now three example embodiments of the present invention will be described. These embodiments may be used individually and/or combined and they have different behaviors. It is regarded as inventive in comparison with the related art that it is possible to select between such embodiments in pairs.

a) ForcedReconfig: The simplest strategy is to send the Reconfig signal via all interfaces of an object, i.e., it propagates along the data paths and/or trigger paths belonging to a certain configuration while other configurations remain unaffected. This ensures that all interconnected objects in the PA receive the signal. For the sake of restriction, the signal must be blocked at suitable locations. This method, i.e., signal, ensures that a configuration is removed completely. The signal is referred to below as ForcedReconfig. This signal should be used only after all data in the particular objects have been processed and removed because there is no synchronization with data processing. Although all objects belonging to a certain configuration within an array are thus forced to allow reconfiguration, other configurations running simultaneously on other objects of the same array remain unaffected.

b) SyncReconfig: A Reconfig is sent together with the corresponding data and/or triggers. It is sent only together with active data packets and/or trigger packets. The signal is preferably relayed together with the last data packet and/or trigger packet to be processed and indicates the end of the data processing after this data/trigger packet. In an example embodiment, if a PAE requires multiple cycles for processing, the forwarding of SyncReconfig is delayed until the trigger packet and/or data packet has in fact been sent. This signal is thus synchronized with the last data processing. As described below, this synchronized reconfiguration according to the present invention may be blocked at certain locations.

c) ArrayReset: ArrayReset may be used as an extension of ForcedReconfig which cannot be blocked and results in reconfiguration of the complete array. This method is particularly appropriate when, for example, an application is terminated or an illegal opcode (see PCT/DE03/00152) and/or timeout of a configuration has occurred and proper termination of the configuration cannot be ensured with other strategies. This is important for a power-on reset, or the like, in particular.

5.1 SyncReconfig

When SyncReconfig is propagated, it always contains valid active data or triggers.

Problems occur when, in the case of branching, the signal is propagated only in the active branch (FIG. 5a) or when branching or combining is blocked due to lack of data and/or triggers (FIG. 5b).

To solve this problem, the semantics of SyncReconfig is defined as follows. The signal indicates that after receiving and completely processing the data/triggers, all the data/trigger sources (sources) and buses leading to the input of an object which has received the SyncReconfig signal are reconfigured. A ReconfigEcho signal may be introduced for this purpose. After the arrival of SyncReconfig at a destination object, a ReconfigEcho is generated by it, preferably only and as soon as the destination object has completely processed the data arriving with the SyncReconfig signal. This generated ReconfigEcho is then sent to all sources connected to the object, i.e., its inputs, and results in reconfiguration, i.e., reconfigurability of the sources and/or the bus systems transmitting data and/or triggers.

If an object receives a ReconfigEcho, this signal is transmitted further upstream, i.e., it is transmitted via the buses to its sources via all the inputs having bus switches still closed. After being generated, ReconfigEcho is thus sent to the data and/or trigger sources that feed into an object, and the signals are forwarded from there.

Inputs/outputs that have already received a SyncReconfig preferably become passive due to its arrival, i.e., they no longer execute any data/trigger transfers. Depending on the embodiment, a SyncReconfig may only induce passivation of the input at which the signal has arrived or passivation of all inputs of the PAE.

A ReconfigEcho usually arrives at the outputs of PAEs. This causes the ReconfigEcho to be relayed via the inputs of the PAE if they have not already been passivated by a received SyncReconfig.

In some cases, e.g., in FIGS. 5a through 5c, ReconfigEcho may also occur at the inputs. This may result in passivation of the input at which the signal arrived, depending on the embodiment, or in a preferred embodiment it may trigger passivation of all inputs of the PAEs.

5.2 Trigger having Reconfig Semantics

In some cases (e.g., FIG. 5b), an implicit propagation of the Reconfig signals (in particular SyncReconfig, ReconfigEcho) is impossible.

For the required explicit transmission of any Reconfig signals, the trigger system according to PCT/DE98/00334 may be used, to which end the trigger semantics is extended accordingly. Triggers may thus transmit any status signals and control signals (e.g., carry, zero, overflow, STEP, STOP, GO; see PCT/DE98/00334, PCT/DE00/01869, and PCT/EP02/02403), as well as the implicit Reconfig signals. In addition, a trigger may assume the SyncReconfig, ReconfigEcho, or ForcedReconfig semantics.

5.3 Blocking

At each interface which sends a SyncReconfig, it is possible to set whether sending or relaying is to take place. Suppressing propagation results in stopping a reconfiguration wave that would otherwise propagate over the array and/or the configuration affected by it. However, regardless of the blockades to be set up for certain locations during configuration in a self-modifying or data-dependent manner and/or under or for certain conditions, data and/or trigger signals may continue to run over a blocked position, in order to be processed further as before, as provided with the protected configuration and/or a protected configuration part.

If necessary, it would also be possible to locally suppress the response to the reconfiguration request, i.e., to ignore the reconfiguration request locally but nevertheless send a signal indicative of the arrival of a locally ignored reconfiguration request signal to downstream objects, whether blocked or unblocked.

As a rule, however, when individual objects of a configuration are to be blocked, it is preferable to send the reconfiguration request signal over separate buses, bus segments or lines to downstream objects past a blocking object. The normally preferred case in which the reconfiguration request signal must penetrate into the object is then easier to maintain, i.e., not only peripherally relayed in forward or reverse registers, if provided, and thus sent past the actual cell. It is then preferable that, in the case of blocking of a reconfiguration request signal (or a certain reconfiguration request signal of a plurality of differentiable reconfiguration request signals), this blocked reconfiguration request signal “dies” in the particular object, i.e., is not to be forwarded.

If the acceptance of SyncReconfig at the receiving interface is blocked, then the receiving object switches the interface receiving SyncReconfig to passive (i.e., the interface no longer sends and/or receives any data); otherwise, the object does not respond to the signal but it may send back the ReconfigEcho to permit the release of the transmitting bus system.

In addition, it is possible to block ReconfigEcho either independently of and/or jointly with a ReconfigBlock.

5.4 Effect of SyncReconfig and ForcedReconfig on Bus Systems

To ensure that, after transmission of a SyncReconfig over a bus, no subsequent data and/or triggers, which originate from a following configuration, for example, and would thus be processed incorrectly, are transmitted, SyncReconfig preferably blocks the sending of the handshake signals RDY/ACK (see PCT/DE97/02949), which indicate the presence of valid data on the bus and control the data transmission, over the bus. The bus connections per se, i.e., the data and/or trigger network, are not interrupted to permit resending of ReconfigEcho over the bus system. The bus is dismantled and reconfigured only with the transmission of ReconfigEcho.

In other more general terms, according to an example embodiment of the present invention, the occurrence of SyncReconfig first prevents data and/or triggers from being relayed over a bus—except for ReconfigEcho—e.g., by blocking the handshake protocols and ReconfigEcho subsequently induces the release and reconfiguration of the bus.

Other methods having an equivalent effect may be used. For example, data and trigger connections may be interrupted even in a run-through of SyncReconfig, whereas the ReconfigEcho connection is dismantled only on occurrence of ReconfigEcho.

This ensures that data and triggers of different configurations which do not belong together will not be exchanged incorrectly via the configurations.

FIG. 5 shows an example of PAEs (0501) having differently configured interconnections. The following transmissions are defined: data and/or trigger buses (0502), SyncReconfig (0503), and ReconfigEcho (0504). In addition, ReconfigBlock (0505) is also shown. 0506 indicates that SyncReconfig is not relayed.

FIG. 5a illustrates a branching such as that which may occur, for example due to an IF-THEN-ELSE construct in a program. After a PAE, the data is branched into two paths (0510, 0511), only one of which is always active. In the case depicted here, a last data packet is transmitted together with SyncReconfig, and branch 0510 is not active and therefore does not relay the data and does not relat SyncReconfig. Branch 0511 is active and relays the data and SyncReconfig. According to an example embodiment of the present invention, the transmitting bus system is switched to inactive immediately after the transmission and is then able to transmit back only ReconfigEcho. PAE 0501b receives SyncReconfig and sends it to PAE 0501c, which sends ReconfigEcho back to 0501a, whereupon 0501a and the bus system between 0501a and 0501b are reconfigured. The transmission between 0501b and 0501c takes place accordingly.

0501e has also received SyncReconfig from 0501a but the branch is not active. Therefore, 0501e does not respond, i.e., 0501e does not send SyncReconfig to 0501f; nor does it send the ReconfigEcho back to 0501a.

0501c processes the incoming data and forwards SyncReconfig to 0501d. This sequence initially corresponds to the transmission from 0501a to 0501b. After processing the data, 0501d generates a ReconfigEcho which is also sent to 0501f because the branches are combined. Although 0501f has not performed a data operation, the unit is reconfigured and sends the ReconfigEcho to 0501e which is then also reconfigured—without new data processing having taken place.

ReconfigEcho transmitted from 0501b to 0501a may also be transmitted in a preferred embodiment to 0501e where it arrives at an input. This results in passivation of the input and in passivation of all inputs in an expanded embodiment, which may also be reconfigurable.

To impart a local character to the examples in FIG. 5, the inputs/outputs in the diagrams have been provided with a ReconfigBlock so that the forwarding of SyncReconfig and ReconfigEcho is suppressed.

FIG. 5b is largely identical to FIG. 5a which is why the same references are also being used. The right-hand path is again active and the left-hand path is inactive. The essential difference is that instead of combining the paths at 0501d, the paths now remain open and lead directly to the peripheral interface, for example. In such cases, it is possible and preferable to provide an explicit wiring of ReconfigEcho via trigger lines (0507) between the PAEs (0501i and 0501j).

FIG. 5c shows the exemplary embodiment of a loop. This loop runs over PAEs 0501m, . . . , 0501r. The transmissions between PAEs 0501m, . . . , 0501r are evidently equivalent here according to the preceding discussion, in particular regarding the transmissions between 0501b and 0501c.

The transmission between 0501r and 0501m deserves special attention. In an example embodiment, when ReconfigEcho appears at 0501m, the bus (0508) between 0501m and 0501r is reconfigured by the transmission of ReconfigEcho. ReconfigEcho is blocked at the output of 0501r. Therefore, 0501r is not reconfigured but the particular output is switched to passive on arrival of ReconfigEcho, i.e., 0501r no longer sends any results on the bus. Therefore, the bus may be used by any other configuration.

As soon as 0501r receives ReconfigEcho from 0501q, 0501r is reconfigured at the end of the data processing. The ReconfigBlock and/or the passivation of the bus connection to 0501m (0508) prevents forwarding toward 0501m. Meanwhile, 0501m and 0508 may be used by another configuration.

6.0 SyncReconfig II

Another optional method for controlling the SyncReconfig protocol is described below. This method may be preferred, depending on the application, the area of use, and/or embodiment of the semiconductor or system.

This method is defined as follows:

1. SyncReconfig is transmitted in principle over all connected buses of a PAE (data buses and/or trigger buses), even over the buses which are not currently (in the current cycle) transmitting any data and/or triggers.

2. In order for a PAE to relay SyncReconfig according to paragraph 1, first all the connected inputs of the PAE must have received SyncReconfig.

2a. Feedback in the data structure (e.g., loops) requires an exception to the postulate according to paragraph 2. Feedback coupling is excepted, i.e., it is sufficient if all the connected inputs of a PAE except those in a feedback loop have received SyncReconfig so that it is forwarded.

3. If a PAE is processing data (under some circumstances even in multiple cycles, e.g., division), then a SyncReconfig (if this is applied to the inputs according to 2 and 2a) is relayed to the receiver(s) at the point in time when the calculation and forwarding of the data and/or triggers is completed. In other words, SyncReconfig does not overtake data processing.

4. If a PAE is not processing any data (e.g., because no data is queued up at the inputs and/or there is no corresponding trigger for enabling data processing (see PCT/DE98/00334)) but it has received SyncReconfig at all configured inputs, then the PAE forwards SyncReconfig via all configured outputs. No data processing takes place (there is no queued-up input data and/or enable trigger (PCT/DE98/00334)), and accordingly no data is transmitted further. In other words: PAEs that are not processing data relay SyncReconfig further immediately to the connected receivers but with the cycles synchronized, if necessary.

SyncReconfig is preferably transmitted together with handshake signals (e.g., RDY/ACK=reaDY/ACKnowledge). A PAE sending a SyncReconfig does not enter the reconfigurable state until all receivers have acknowledged receipt of SyncReconfig for confirmation by an ACK(nowledge).

In this method, the basic question arises as to what happens when a configuration is not yet completely configured but is already to be reconfigured again. Apart from the consideration as to whether such behavior of an application does not require better programming, the problem is solved as follows: if a PAE attempts to forward SyncReconfig to a PAE that is not yet configured, it will not receive an ACK until the PAE is configured and acknowledges SyncReconfig. This might result in a loss of performance because of waiting until the configuration of the configuration to be deleted is completed before deleting it. On, the other hand, however, this is a very rare case which occurs only under unusual circumstances.

FIG. 8a shows a basic method to be used according to an example embodiment of the present invention. SyncReconfig 0805 arrives at PAE 0806, which forwards a signal at the end of data processing together with data 0807. Connections that have been configured but not used during the data processing also forward the data (0808).

Although SyncReconfig arrives from 0806 via 0807 in the case of PAE 0809, SyncReconfig is still outstanding for the second input. Therefore, 0809 does not forward SyncReconfig. PAE 0810 receives SyncReconfig via 0808 but does not receive any data. Via the second input, PAE 0810 likewise receives a SyncReconfig. Although no data processing is taking place in PAE 0810 (the data via 0808 is still outstanding), PAE 0810 relays SyncReconfig without any result data.

FIG. 8b shows the processing of a loop. During the data processing, data is fed back (0824) from PAE 0822 to PAE 0821. At 0821, a SyncReconfig arrives via 0820. This is relayed to the downstream PAEs in the loop as far as PAE 0822. PAE 0822 relays (0823) SyncReconfig to downstream PAEs not belonging to the loop. Neither SyncReconfig nor data is transmitted via loop feedback 0824 (see explanation 0803).

0801 means that no SyncReconfig has been transmitted on this bus at the point in time depicted as an example. 0801 implies no information regarding whether data/triggers have been transmitted.

0802 means that a SyncReconfig has been transmitted on this bus at the point in time depicted as an example. 0802 does not imply any statement regarding whether data/triggers have been transmitted.

0803 means that in the case of occurrence of a SyncReconfig at the data transmitter (in this example 0822), no SyncReconfig is transmitted on this bus (regardless of the point in time). 0802 implies that no data/triggers are transmitted.

7. Alternative Protocoling

A protocol according to an example embodiment of the present invention is described below as an alternative to the known RDY/ACK data flow control protocol. It secures data streams even when registers are inserted between the transmitter and receiver at high clock frequencies. To this end, suitable hardware modules are also provided.

Reusable transmitter and receiver units are extracted for these modules, in particular for the communication between an XPP processor field and an XPP configuration controller. These modules and their code are also described below. It should be pointed out that these modules may in part replace and/or supplement XPP-FILMO modules such as those which have been used previously.

The architecture using the RDY/ACK protocol is shown in FIG. 9.

The transmitter must wait for pending ACKs before a RDY signal is assigned. This means that the longest path which determines the frequency of such a system is the path from the receiver to the transmitter, specifically via the logic of the transmitter and back to the receiver and its register enable logic.

An inserted register at the input of the transmitter, as shown in FIG. 10, shortens the longest path, but the logic must wait one cycle longer for pending ACKs. The data transmission rate is reduced to every second clock cycle. This is also true when the pipeline register is not provided at the ACK input, but instead at the RDY and data output.

A second problem occurs when the protocol is used on the PINS or the I/O interface of an XPU. The XPU may be correctly configured and may send a data packet outward. This means that it sends a RDY. Under the assumption that the connected circuit is not in a position to receive data because it is not connected or is not completely programmed, the RDY will be lost and the XPU will be stopped. Later when the connected circuit outside of the XPU is in a position to receive data, it will not respond because it will not send an ACK without having received a RDY.

8. First Approach Using the Credit FIFO Principle

The Credit FIFO idea, according to example embodiments of the present invention, solves the problem of the reduced throughput with a FIFO in the receiver input. The transmitter is always allowed to send another packet if at least one ACK is pending.

This means that when the transmission begins the first time, two packets are sent without knowing whether or not they will be confirmed (acknowledged). Thus, the second problem mentioned in the preceding section may still exist.

FIG. 12 shows one alternative according to an example embodiment of the present invention. The protocol between the transmitter and receiver is the same but all modules have registers in the outputs as a design variant. This is useful for synthesis estimates and time response estimates. The latter architecture does not require more hardware than the former because a data register must also be present in the former variant.

According to an alternative example embodiment of the present invention, the semantics of the ACK signal is changed to the meaning of “would issue an ACK,” i.e., it shows the ability to receive data. Therefore, these signals are called “ABLE” signals. FIG. 5 shows the version in which there are registers at all module outputs.

The transmitter may always send data in the direction of the receiver if allowed by the ABLE signal. This protocol may then disable the second register in the receiver part if it is certain that the transmitter is holding the transmitted data in a stable stall situation until the receiver signals “ABLE” again.

9.1 Protocol Evaluation-Credit System Semantics

The credit system has the following semantics:

Transmitter: “I am allowed here to send two data packets and as many additional packets as I receive acknowledgments for. If I am not allowed to send another packet, then the last data value must remain valid on the BUS.”

Receiver: “Each received packet will be acknowledged as soon as I am able to receive others.”

9.2 RDY-ABLE Semantics

The RDY-ABLE protocol has the following semantics:

Transmitter: “If the ABLE signal is ‘high,’ I am allowed to send a data packet which is also valid, with a ready signal being on the connection bus during the entire next cycle. If the ABLE signal is ‘low,’ then I must ensure that the instantaneous data will remain on the bus for another cycle.”

Receiver: “ABLE will always be assigned to the connecting bus for the entire next cycle if I am certain that no incoming data packet is lost.”

There may be a number of variants for implementing the RDY-ABLE protocol, e.g., pulsed RDY-ABLE or RDY-ABLE having pulsed data. The meanings of high and low may be the opposite of those described above. For pulse-like protocols, each data packet must be valid for only one cycle. This variant needs one more input register in the receiver and may be useful if the bus between the transmitter and receiver is used by more than one connection or possibly is used bidirectionally. Certain IO additions to XPU architectures may be some examples of this.

Comparison

In situations where the number of credits is not known to the transmitter, the credit system is more stable, whereas RDY-ABLE has the advantage that data is not sent until the receiver is in a position to receive data. RDY has an ACK-time curve with a credit system. FIG. 6 shows the bus signal between a transmitter and a receiver in a credit system having RDY/ACK protocol. Five cases are outlined below:

- 1. transmission of a single packet;
- 2. streaming;
- 3. receiver is not immediately ready to receive;
- 4. receiver is able to receive only at the beginning; and
- 5. receiver is not ready to receive additional data, e.g., because it has not been reconfigured or it is unable to supply any additional data to the next receiver.

2.5 FIG. 15 shows the bus signal between a transmitter and a receiver using the RDY protocol.

Four cases are outlined:

- 1. transmission of a single packet “I am allowed while ABLE is active”;
- 2. streaming is consistently high during ABLE;
- 3. the transmitter transmits regardless of the ability of the receiver to receive data; and
- 4. the transmitter stops the flow for one cycle.

To make the communication bus free for other users more frequently, the pulsed RDY-ABLE protocol may be used. However, it is not the standard when simpler hardware is desired because it increases the hardware complexity by the addition of one register. Reference may be made to FIG. 16 for the comparison.

The hardware for RDY (FIG. 17) includes a general module which has a transmitter part and a transmitter part for data using the RDY protocol. A specific module may insert its required data processing hardware between the transmitter and the receiver unit. If the central part of FIG. 7 is omitted, then the local RDY, ABLE and data signals fit directly on top of one another on the transmitter and receiver units. The resulting module—just one transmitter and one receiver unit—is useful in a pipeline stage where many of these modules may be used between a real data producing module and a data-using module. This is useful when a transmitter and a receiver are to be connected over a great distance without having to reduce the frequency or throughput.

A module must contain not only a receiver and a transmitter, but in many cases multiple receivers and one or more transmitters will be provided in one module, e.g., and arithmetic logic unit or a dual-ported RAM. This is advisable when data is generated in different ways or when data is received via another protocol. Examples may include configurable counters (without receivers) or displays (no forwarding).

Insertion of Simple Registers:

According to an example embodiment of the present invention, if the bus must have simple register stages between the transmitter and receiver, then the receiver must be increased by two registers per inserted stage. An example for this need is to provide register stages at chip boundaries, e.g., connection pieces provided with registers.

Addendum

Receiver and transmitter for AMBA interfaces:

FIG. 18 illustrates one possible interface arrangement of AMBA for the CM interface of a unit having an XPP core.

For external units with the CM interface of an XPP core, the use of two modules is recommended.

FIG. 19 shows the internal structure of the receiver part which is required in the external interface for the 16-bit output port of the configuration manager, according to an example embodiment of the present invention.

The reception of data functions as follows: when the receiver module displays a 1 (HIGH) on recv_valid, then data has been received and it is instantaneously available at the recv_data output. If the surrounding module is able to receive this data, it assigns a 1 (HIGH) to recv_able. The data is then available only until the end of the same cycle. The data received next is then presented, if available.

For some circuits it may be beneficial to use the recv_rdy signal which shows that data is currently being taken from the receiver. It is an AND logic result from recv_valid and recv_able.

Transmitters in External Units

FIG. 20 shows the internal structure of the transmitter part which is to be part of the external module that establishes an interface connection with the 16-bit input port of the configuration manager, according to an example embodiment of the present invention. A conventional 43-bit code word input of a CM (configuration manager) may also expect this input externally. Both versions may be available in a simulation environment.

If this module and the XPP are directly connected, the signals send_req and n_back may both be set at 0 (LOW). The n_back and n_oe are not used. Data is transmitted as follows: When the transmitter module shows a 1 (HIGH) at send able, the send_rdy signal may be set at 1 (HIGH) namely with valid data at the send_data input. All this takes place in the same cycle. If new data is available in the next cycle, the send_rdy may be set again at 1 (HIGH). Otherwise, it is to be enabled. Send_data need not be valid in any cycle in which send_rdy is 0 (LOW).

Claims

1-8. (canceled)

9. A data transmission controlling method comprising:

transmitting, by a first hardware element and to a second hardware element, a data packet at least one of conditional upon and responsive to the second hardware element assigning a signal to a connecting bus via which the data packet is transmitted, the signal indicating that no incoming data packet can be lost.

10. A data transmission controlling method, comprising:

transmitting, by a first hardware element and to a second hardware element, a first data packet and subsequently a second data packet; and

receiving, by the first hardware element and from the second hardware element, an acknowledgement of the first data packet subsequent to the transmittal of the second data packet.