Checking data integrity
A network interface device for connection to a data processing device and to a data network so as to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol, the network interface device being configured to: identify within the payloads of such packets data of a further protocol, the data of the further protocol comprising payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified; on so identifying data of the further protocol, process at least the payload data for determining the integrity thereof and transmit to the data processing device at least some of the framing data and an indication of the result of the said processing.
This application claims priority to PCT Application No. PCT/GB2005/001376, entitled Checking Data Integrity which was published as WO 2005/104479 and which is entitled to a priority date of Apr. 21, 2004.
FIELD OF THE INVENTIONThis invention relates to a network interface, for example an interface device for linking a computer to a network.
SUMMARYTo overcome the drawbacks of the prior art and provide additional benefits, a method and apparatus is disclosed herein for packet processing. In one embodiment a network interface device is disclosed that is configured to connect to a data processing device and to a data network to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol. The network interface device is configured to identify, within the payloads of such packets, data of a further protocol. In this configuration the data of the further protocol comprises payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified. Upon so identifying data of the further protocol, the device processes at least the payload data for determining the integrity thereof and transmits, to the data processing device, at least some of the framing data and an indication of the result of the said processing.
In one embodiment, the network interface device is configured to process the payload data by applying a predetermined function to the payload data to form a verification result. It is also contemplated that the verification result may be the indication of the result. Moreover, the network interface device may be configured to transmit, to the data processing device, the payload data together with at least some of the framing data. The network interface device may be further configured to process the payload data by comparing the verification result with the verification data and the result of that comparison is the indication of the result. In addition, the network interface device may be configured to, if the result of that comparison is that the verification result matches the verification data, transmit to the data processing device the payload data together with at least some of the framing data.
It is further contemplated that the network interface device may be configured to, if the result of that comparison is that the verification result does not match the verification data, not transmit, to the data processing device, the payload data. In one configuration the predetermined function is a cyclic redundancy check function or an authentication function. In one embodiment the predetermined function is a function that involves byte-by-byte processing of the payload data. Likewise, the packets of the transport protocol comprise packet headers of that protocol and the network interface device is configured to, on identifying data of the further protocol, transmit to the data processing device at least some of the header(s) of the packet(s) of the transport protocol that carried that the payload data together with the payload data.
As will be understood, the network interface device may be configured to perform the transmission to the data processing device by transmitting data to a transport library supported by the data processing device. In addition, the transport protocol may comprise the TCP (transmission control protocol) protocol. In one embodiment the further protocol is a protocol for remote direct memory access or the RDMA (remote direct memory access) or ISCSI (internet small computer serial interface) protocol. The further protocol may supports memory write instructions such that the framing data includes information indicative of a memory address of the data processing device to which at least some of the payload data is to be written.
In one embodiment the network interface device is configured to, upon identifying at least some forms of data of the further protocol, raise an interrupt on the data processing apparatus. The forms of data may include memory read instructions or memory write instructions. In one embodiment the read write instructions include information indicative of a memory address at which the read/write is to be performed.
Also disclosed herein is a data processing system comprising a data processing device and a network interface device for connection to the data processing device and to a data network. This configuration provides an interface between the data processing device and the network for supporting the network of packets of a transport protocol such that the network interface device is configured to identify within the payloads of such packets data of a further protocol. Thus, the data of the further protocol may comprise payload data of the further protocol and framing data of the further protocol, and the framing data may include verification data for permitting the integrity of the payload data to be verified. Upon so identifying data of the further protocol this system may process at least the payload data for determining the integrity thereof and then transmit to the data processing device at least some of the framing data with an indication of the result of the processing.
Also disclosed herein is a method for processing data by means of a network interface device which also connects to a data processing device and to a data network so as to provide an interface between the data processing device and the network. This method supports the network of packets of a transport protocol. In one embodiment the method comprising performing the following steps by means of the network interface device by identifying, within the payloads of such packets, data of a further protocol. The data of the further protocol may comprise payload data of the further protocol and framing data of the further protocol such that the framing data includes verification data for permitting the integrity of the payload data to be verified. This method also, upon so identifying data of the further protocol, processes at least the payload data to determine the integrity thereof and then transmits to the data processing device at least some of the framing data and an indication of the result of the processing.
Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
BRIEF DESCRIPTION OF THE DRAWINGSThe components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the figures, like reference numerals designate corresponding parts throughout the different views. The present invention will now be described by way of example with reference to the accompanying drawings, in which:
The computer 1 may, for example, be a personal computer, a server or a dedicated processing device such as a data logger or controller. In this example it comprises a processor 2, a program store 4 and a memory 3. The program store stores instructions defining an operating system and applications that can run on that operating system. The operating system provides means such as drivers and interface libraries by means of which applications can access peripheral hardware devices connected to the computer.
It is desirable for the network interface device to be capable of supporting standard transport protocols such as TCP, RDMA and ISCSI at user level: i.e. in such a way that they can be made accessible to an application program running on computer 1. Such support enables data transfers which require use of standard protocols to be made without requiring data to traverse the kernel stack. In the network interface device of this example standard transport protocols are implemented within transport libraries accessible to the operating system of the computer 1.
There are a number of difficulties in implementing transport protocols at user level. Most implementations to date have been based on porting pre-existing kernel code bases to user level. Examples of these are Arsenic and Jet-stream. These have demonstrated the potential of user-level transports, but have not addressed a number of the problems required to achieve a complete, robust, high-performance commercially viable implementation.
The operation of this architecture is as follows.
On packet reception from the network interface hardware (e.g. a network interface card (NIC)), the NIC transfers data into pre-allocated data buffer (a) and invokes the OS interrupt handler by means of the interrupt line. (Step i). The interrupt handler manages the hardware interface e.g. posts new receive buffers and passes the received (in this case Ethernet) packet looking for protocol information. If a packet is identified as destined for a valid protocol e.g. TCP/IP it is passed (not copied) to the appropriate receive protocol processing block. (Step ii).
TCP receive-side processing takes place and the destination part is identified from the packet. If the packet contains valid data for the port then the packet is engaged on the port's data queue (step iii) and that port marked (which may involve the scheduler and the awakening of blocked process) as holding valid data.
The TCP receive processing may require other packets to be transmitted (step iv), for example in the cases that previously transmitted data should be retransmitted or that previously enqueued data (perhaps because the TCP window has opened) can now be transmitted. In this case packets are enqueued with the OS “NDIS” driver for transmission.
In order for an application to retrieve a data buffer it must invoke the OS API (step v), for example by means of a call such as recv( ), select( ) or poll( ). This has the effect of informing the application that data has been received and (in the case of a recv( ) call) copying the data from the kernel buffer to the application's buffer. The copy enables the kernel (OS) to reuse its network buffers, which have special attributes such as being DMA accessible and means that the application does not necessarily have to handle data in units provided by the network, or that the application needs to know a priori the final destination of the data, or that the application must pre-allocate buffers which can then be used for data reception.
It should be noted that on the receive side there are at least two distinct threads of control which interact asynchronously: the up-call from the interrupt and the system call from the application. Many operating systems will also split the up-call to avoid executing too much code at interrupt priority, for example by means of “soft interrupt” or “deferred procedure call” techniques.
The send process behaves similarly except that there is usually one path of execution. The application calls the operating system API (e.g. using a send ( ) call) with data to be transmitted (Step vi). This call copies data into a kernel data buffer and invokes TCP send processing. Here protocol is applied and fully formed TCP/IP packets are enqueued with the interface driver for transmission.
If successful, the system call returns with an indication of the data scheduled (by the hardware) for transmission. However there are a number of circumstances where data does not become enqueued by the network interface device. For example the transport protocol may queue pending acknowledgements or window updates, and the device driver may queue in software pending data transmission requests to the hardware.
A third flow of control through the system is generated by actions which must be performed on the passing of time. One example is the triggering of retransmission algorithms. Generally the operating system provides all OS modules with time and scheduling services (driven by the hardware clock interrupt), which enable the TCP stack to implement timers on a per-connection basis.
If a standard kernel stack were implemented at user-level then the structure might be generally as shown in
(i) System API calls provided by the application
(ii) Timer generated calls into protocol code
(iii) Management of the virtual network interface and resultant upcalls into protocol code. (ii and iii can be combined for some architectures)
However, this arrangement introduces a number of problems:
(a) The overheads of context switching between these threads and implementing locking to protect shared-data structures can be significant, costing a significant amount of processing time.
(b) The user level timer code generally operates by using operating system provided timer/time support. Large overheads caused by system calls from the timer module result in the system failing to satisfy the aim of preventing interaction between the operating system and the data path.
(c) There may be a number of independent applications each of which manages a sub-set of the network connection; some via their own transport libraries and some by existing kernel stack transport libraries. The NIC must be able to efficiently parse packets and deliver them to the appropriate virtual interface (or the OS) based on protocol information such as IP port and host address bits.
(d) It is possible for an application to pass control of a particular network connection to another application for example during a fork( ) system call on a Unix operating system. This requires that a completely different transport library instance would be required to access connection state. Worse, a number of applications may share a network connection which would mean transport libraries sharing ownership via (inter process communication) techniques. Existing transports at user level do not attempt to support this.
(e) It is common for transport protocols to mandate that a network connection outlives the application to which it is tethered. For example using the TCP protocol, the transport must endeavour to deliver sent, but unacknowledged data and gracefully close a connection when a sending application exits or crashes. This is not a problem with a kernel stack implementation that is able to provide the “timer” input to the protocol stack no matter what the state (or existence) of the application, but is an issue for a transport library which will disappear (possibly ungracefully) if the application exits, crashes, or stopped in a debugger.
Furthermore, RDMA (remote direct memory access) and ISCSI (internet small computer system interface) are protocols that allow one device such as a computer to directly access the contents of the memory of another (“target”) device to which it is connected over a network. The protocols involve embedding in conventional network packets strings of data that define the operations to be performed according to the protocol. For example, to perform an RDMA operation to write data to the memory of a remote computer a TCP packet may be sent to that computer with a payload containing string made up of: a marker marking the start of RDMA data, a tag indicating where in the memory the data is to be written to, the data itself, and a CRC block to allow the integrity of the data to be verified on receipt. A single TCP packet may contain multiple such strings. When the TCP packet is received the data in its payload can be identified as RDMA data and processed accordingly to perform the desired write operation.
The processing of the packet to extract, verify and interpret the RDMA data can be performed by a processor of the target device itself or by a network interface device of the target device. However, it is conventional for the processing to be performed by the network interface device because this allows the passing of the data to and from the memory of the target to be performed efficiently. If the processing were performed by a processor of the target device then two memory write operations would be required since the RDMA data string would first have to be passed to a buffer area of the device's memory for processing, and then—when the destination address of the data had been determined—it would be copied to that address. In contrast, if the RDMA processing is performed on the network interface device then the destination address can be determined there and the data can be written directly to that address, saving the copy operation that would otherwise be required. For this reason the approach of processing RDMA or ISCSI data on the network interface device is preferred. However, it has the disadvantage that it requires the network interface device to have considerable processing power. This increases expense, especially since embedded processing power on devices such as network interface devices is typically more expensive than main processor power.
It would be desirable to provide an enhanced means of supporting protocols such as RDMA and ISCSI.
According to one aspect of the present invention there is provided a network interface device for connection to a data processing device and to a data network so as to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol, the network interface device being configured to identify within the payloads of such packets data of a further protocol, the data of the further protocol comprising payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified; on so identifying data of the further protocol, process at least the payload data for determining the integrity thereof and transmit to the data processing device at least some of the framing data and an indication of the result of the said processing.
Further aspects and preferred features of the present invention are set out in the claims.
The principal differences between the architecture of the example of
(i) TCP code which performs protocol processing on behalf of a network connection is located both in the transport library, and in the OS kernel. The fact that this code performs protocol processing is especially significant.
(ii) Connection state and data buffers are held in kernel memory and memory mapped into the transport library's address space.
(iii) Both kernel and transport library code may access the virtual hardware interface for and on behalf of a particular network connection.
(iv) Timers may be managed through the virtual hardware interface, (these correspond to real timers on the network interface device) without requiring system calls to set and clear them. The NIC generates timer events which are received by the network interface device driver and passed up to the TCP support code for the device.
It should be noted that the TCP support code for the network interface device is in addition to the generic OS TCP implementation. This is suitably able to co-exist with the stack of the network interface device.
The effects of this architecture are as follows.
(a) Requirement for Multiple Threads Active in the Transport Library
This requirement is not present for the architecture of
(b) Replacement to Issue System Calls for Timer Management
This requirement is not present for the architecture of
(c) Correct Delivery of Packets to Multiple Transport Libraries
The network interface device can contain or have access to content addressable memory, which can match bits taken from the headers of incoming packets as a parallel hardware match operation. The results of the match can be taken to indicate the destination virtual interface which must be used for delivery, and the hardware can proceed to deliver the packet onto buffers which have been pushed on the VI. One possible arrangement for the matching process is described below. The arrangement described below could be extended to de-multiplex the larger host addresses associated with IPv6, although this would require a wider CAM or multiple CAM lookups per packet than the arrangement as described.
One alternative to using a CAM for this purpose is to use a hash algorithm that allows data from the packets' headers to be processed to determine the virtual interface to be used.
(d) Handover of Connections Between Processes/Applications/Threads
When a network connection is handed over the same system-wide resource handle can be passed between the applications. This could, for example, be a file descriptor. The architecture of the network interface device can attach all state associated with the network connection with that (e.g.) file descriptor and require the transport library to memory map on to this state. Following a handover of a network connection, the new application (whether as an application, thread or process)—even if it is executing within a different address space—is able to memory-map and continue to use the state. Further, by means of the same backing primitive as used between the kernel and transport library any number of applications are able to share use of a network connection with the same semantics as specified by standard system APIs.
(e) Completion of transport protocol operations when the transport library is ether stopped or killed or quit.
This step can be achieved in the architecture of the network interface device because connection state and protocol code can remain kernel resident. The OS kernel code can be informed of the change of state of an application in the same manner as the generic TCP (TCPk) protocol stack. An application which is stopped will then not provide a thread to advance protocol execution, but the protocol will continue via timer events, for example as is known for prior art kernel stack protocols.
As discussed above, there are a number of newly emerging protocols such as IETF RDMA and iSCSI. At least some of these protocols were designed to run in an environment where the TCP and other protocol code executes on the network interface device. Facilities will now be described whereby the processing to support such protocols can be executed at least partially on a host CPU (i.e. using the processing means of a computer to which a network interface card is connected). Such an implementation is advantageous because it allows a user to take advantage of the price/performance lead of main CPU technology as against co-processors.
The format of RDMA instructions is given in the RDMA specification, which is available from www.rdmaconsortium.org.
Protocols such as RDMA involve the embedding of framing information and cyclic redundancy check (CRC) data within the TCP stream. While framing information is trivial to calculate within protocol libraries, CRC's (in contrast to checksums) are computationally intensive and best done by hardware. To accommodate this, when a TCP stream is carrying an RDMA or similar encapsulation, an option in the virtual interface can be enabled, for example by means of a flag. On detecting this option, the NIC will parse each packet on transmission, recover the RDMA frame, apply the RDMA CRC algorithm and insert the CRC on the fly during transmission. Analogous procedures can beneficially be used in relation to other protocols, such as iSCSI, that require computationally relatively intensive calculation of error check data.
In line with this system the network interface device can also verify CRCs on received packets using similar logic. This may, for example, be performed in a manner akin to the standard TCP checksum off-load technique.
To execute this arrangement, the steps performed are preferably as follows. When operating in an RDMA compatible mode the NIC analyses the payload of each received TCP packet to identify whether it comprises RDMA data. This may be done by checking whether the RDMA framing data (i.e. the RDMA header and footer) and particularly the RDMA header marker is present in the payload. If it is not present then the packet is processed as normal. If it is present then the payload of the packet is processed by the NIC according to the RDMA CRC algorithm in order to calculate the RDMA CRC for the received data. Once that has been calculated then one of two routes can be employed. In a first route the RDMA data together with the calculated CRC is passed to the host computer. The host computer can then compare the calculated CRC with the CRC as received in the RDMA data to establish whether the data has been correctly received. Alternatively, in a second route that comparison can be performed at the NIC and the RDMA data together with an indication of the result of that comparison (e.g. in a one-bit flag) is passed to the host computer. In either case the host computer can then process the RDMA data accordingly. Thus, if the result of the CRC check indicates that data has been correctly received it can execute the RDMA command represented by the data (typically a read or write command). Otherwise it does not execute the command, and in that case it may automatically perform an error recovery action such as initiating a request for retransmission of the data.
If the NIC performs the checking of the CRC in addition to its calculation then if it determines that the data has not been validly received it need not transmit the payload of the corresponding RDMA data to the host computer. It need only transmit sufficient information from the header of the transport protocol packet (typically a TCP header) and from the RDMA framing information to allow the host computer to request retransmission. It may transmit the whole of that header and framing information or it could transmit just some of that header and framing information. It will be appreciated that this operation is performed on a per-RDMA-data-unit basis. Thus, if a TCP packet contains a single RDMA data unit it is the framing data of that same data unit and the header of that same packet (or part thereof) that are passed to the host computer. If a TCP packet contains multiple RDMA data units then if any RDMA data unit is determined to be bad then its framing data and the header of the entire packet (or part thereof) are transmitted to the host PC.
Protocols such as Rt)MA also mandate additional operations such as RDMA READ which in conventional implementations require additional intelligence on the network interface device. As indicated above, this type of implementation has led to the general belief that RDMA/TCP should best be implemented by means of a co-processor network interface device. In an architecture of the type described herein, specific hardware filters can be encoded to trap such upper level protocol requests for a particular network connection. In such a circumstance, the NIC can generate an event akin to the timer event in order to request action by software running on the attached computer, as well a delivery data message. By triggering an event in such a way the NIC can achieve the result that either the transport library, or the kernel helper will act on the request immediately. This can avoid the potential problem of kernel extensions not executing until the transport library is scheduled and can be applied to other upper protocols if required.
The calculation of the CRC is preferably performed by dedicated hardware of the NIC, since this provides a particularly efficient way of carrying out such bit-by-bit operations. Similarly, the above method could be applied to calculations other than CRC calculations—which may for example include authentication, encryption and decryption operations.
Whilst this example has been described with reference to RDMA, it could be applied to other protocols.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.
Claims
1. A network interface device for connection to a data processing device and to a data network so as to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol, the network interface device being configured to identify within the payloads of such packets data of a further protocol, the data of the further protocol comprising payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified and, upon so identifying data of the further protocol, process at least the payload data for determining the integrity thereof and transmit to the data processing device at least some of the framing data and an indication of the result of the said processing.
2. A network interface device as claimed in claim 1, wherein the network interface device is configured to process the payload data by applying a predetermined function to the payload data to form a verification result.
3. A network interface device as claimed in claim 1, wherein the said verification result is the said indication of the result.
4. A network interface device as claimed in claim 3, wherein the network interface device is configured to transmit to the data processing device the payload data together with the at least some of the framing data.
5. A network interface device as claimed in claim 2, wherein the network interface device is further configured to process the payload data by comparing the verification result with the verification data and the result of that comparison is the said indication of the result.
6. A network interface device as claimed in claim 5, wherein the network interface device is configured to, if the result of that comparison is that the verification result matches the verification data, transmit to the data processing device the payload data together with the at least some of the framing data.
7. A network interface device as claimed in claim 5, wherein the network interface device is configured to, if the result of that comparison is that the verification result does not match the verification data, not transmit to the data processing device the payload data.
8. A network interface device as claimed in any of claims 2, wherein the predetermined function is a cyclic redundancy check function.
9. A network interface device as claimed in any of claims 2, wherein the predetermined function is an authentication function.
10. A network interface device as claimed in any of claims 2, wherein the predetermined function is a function that involves byte-by-byte processing of the payload data.
11. A network interface device as claimed in claim 1, wherein the packets of the transport protocol comprise packet headers of that protocol and the network interface device is configured to, on identifying data of the further protocol, transmit to the data processing device at least some of the header(s) of the packet(s) of the transport protocol that carried that the payload data together with the payload data.
12. A network interface device as claimed in claim 1, wherein the network interface device is configured to perform the transmission to the data processing device by transmitting data to a transport library supported by the data processing device.
13. A network interface device as claimed in claim 1, wherein the Transport protocol is the TCP (transmission control protocol) protocol.
14. A network interface device as claimed in claim 1, wherein the further protocol is a protocol for remote direct memory access.
15. A network interface device as claimed in claim 14, wherein the further protocol is such that it supports memory write instructions wherein the framing data includes information indicative of a memory address of the data processing device to which at least some of the payload data is to be written.
16. A network interface device as claimed in claim 15, wherein the further protocol is the RDMA (remote direct memory access) or ISCSI (internet small computer serial interface) protocol.
17. A network interface device as claimed in claim 1, wherein the network interface device is configured to, on identifying at least some forms of data of the further protocol, raise an interrupt on the data processing apparatus.
18. A network interface device as claimed in claim 17, wherein the said forms of data include memory read instructions.
19. A network interface device as claimed in claim 17, wherein the said forms of data include memory write instructions.
20. A network interface device as claimed in claim 19, wherein the said read write instructions include information indicative of a memory address at which the said read/write is to be performed.
21. A data processing system comprising a data processing device and a network interface device for connection to the data processing device and to a data network so as to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol, the network interface device being configured to identify within the payloads of such packets data of a further protocol, the data of the further protocol comprising payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified and upon so identifying data of the further protocol, process at least the payload data for determining the integrity thereof and transmit to the data processing device at least some of the framing data and an indication of the result of the said processing.
22. A method for processing data by means of a network interface device for connection to a data processing device and to a data network so as to provide an interface between the data processing device and the network for supporting the network of packets of a transport protocol, the method comprising performing the following steps by means of the network interface device:
- identifying within the payloads of such packets data of a further protocol, the data of the further protocol comprising payload data of the further protocol and framing data of the further protocol, and the framing data including verification data for permitting the integrity of the payload data to be verified; and
- upon so identifying data of the further protocol, processing at least the payload data for determining the integrity thereof and transmitting to the data processing device at least some of the framing data and an indication of the result of the said processing.
Type: Application
Filed: Oct 19, 2006
Publication Date: Feb 15, 2007
Inventors: Steve Pope (Cambridge), Derek Roberts (Cambridge), David Riddoch (Cambridge)
Application Number: 11/584,263
International Classification: H04L 12/56 (20060101);