NETWORK PROTOCOL REASSEMBLY ACCELARATION

-

Methods and systems are provided for network protocol reassembly acceleration. According to one embodiment, an incoming packet is received at a network interface. Payload data from the packet is written by a memory interface to a physical page within a system memory on behalf of the network interface based on a sequence number associated with the incoming packet and by obtaining a physical address from a virtual memory map corresponding to an incoming session with which the packet is associated. After the physical page is full, the physical page is made accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the user process.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 61/060,114 filed on Jun. 9, 2008, which is hereby incorporated by reference in its entirety for all purposes.

COPYRIGHT NOTICE

Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever. Copyright© 2008, Fortinet, Inc.

BACKGROUND

1. Field

Embodiments of the present invention generally relate to network traffic acceleration. In particular, embodiments of the present invention relate to inbound network protocol reassembly acceleration (e.g., Transmission Control Protocol (TCP) reassembly acceleration).

2. Description of the Related Art

FIG. 1 conceptually illustrates TCP reassembly processing as it is typically performed. As a result of TCP segmentation processing, which is applied to segment or fragment TCP payload data across multiple outbound TCP packets, and because TCP/IP allows out of order packet delivery to receivers, receivers must be prepared to accept pieces, e.g., incoming packets 160, and put them back together. This is referred to as “TCP reassembly.” The TCP reassembly process is performed by the TCP protocol in the host computer. When TCP segmentation or TCP reassembly are performed on behalf of the host computer by separate hardware or a separate processor, such as one associated with a network interface controller (NIC), the process is said to have been “offloaded.”

Conventional operating systems usually segregate virtual memory into kernel space 140 and user space 150. User mode applications, such as user process 120 (e.g., an antivirus scanning process), are forbidden from writing to or otherwise accessing kernel space 140. User space 150 is the memory area in which user mode applications are permitted to operate.

Typically, TCP payload, such as payload data 147 is intended for delivery or processing by a user process, such as user process 120. Before making the payload data 147 available to the user process 120; however, a kernel process 110 may process header data 145 associated with the payload data 147 to validate the header data 145. The user process 120 may supply a virtually contiguous buffer to the kernel process 110 into which the kernel process 110 copies the payload data 147.

TCP reassembly offloading is used to increase system throughput and decrease central processing unit (CPU) usage; however, in traditional implementations, in order to allow it to be physically addressed by a TCP offload engine (TOE) (not shown), the payload data 147, which is initially stored in kernel space 140, needs to be copied to contiguous user space 150 by the CPU (not shown) to create a payload data copy 157. This movement of payload data from kernel space 140 to user space 150 within system memory 130 is CPU intensive and reduces incoming TCP traffic throughput.

Thus, there is a need in the art for improved inbound network traffic processing.

SUMMARY

Methods and systems are described for network protocol reassembly acceleration. According to one embodiment, a method is provided for reassembly processing without kernel-user space copying of payload data. An incoming packet is received at a network interface. Payload data from the packet is written by a memory interface to a physical page within a system memory on behalf of the network interface based on a sequence number associated with the incoming packet and by obtaining a physical address from a virtual memory map corresponding to an incoming session with which the packet is associated. After the physical page is full, the physical page is made accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the user process.

In the aforementioned embodiment, the incoming packet may be a TCP packet and the incoming session may be an incoming TCP session.

In various instances of the aforementioned embodiments, one or more physical pages of the system memory are allocated by a network interface driver being executed by the processor for each of multiple incoming TCP sessions.

In the context of various embodiments, virtual memory maps corresponding to the incoming TCP sessions may be built by the network interface driver.

In some cases, the network interface driver may also build a session table including information regarding a page directory base address for each of the incoming TCP sessions and information regarding an offset to adjust a start address of the payload data with respect to a boundary of the physical page.

In various instances of the aforementioned embodiments, the method may further involve (i) calculating an adjusted sequence number based on the sequence number and the offset; and (ii) the adjusted sequence number being used as a virtual address input to the virtual memory map.

In the context of various embodiments, the method may further involve, prior to the memory interface writing payload data from the TCP packet to the physical page, determining whether the physical page has been allocated.

Other embodiments of the present invention provide a network device having a processor, a system memory, a network interface, an interconnect bus and a bus/memory interface. The processor is configured to execute one or more user processes and a network interface driver. The system memory is coupled to the processor and has stored therein (i) a paging table used by the processor to translate virtual memory addresses into corresponding physical memory addresses and (ii) multiple virtual memory maps containing information for use in connection with translating a virtual addresses input based on a sequence number of an incoming Transmission Control Protocol (TCP) packet to a physical address. The network interface is operable to receive incoming TCP packets. The interconnect bus is coupled to the processor and the system memory. The bus/memory interface is coupled to the network interface and the interconnect bus and is adapted to write payload data from the incoming TCP packet to a physical page within the system memory on behalf of the network interface based on the sequence number and a virtual memory map corresponding to an incoming TCP session with which the TCP packet is associated. The network interface driver makes the physical page accessible to a user process by remapping the physical page through the paging table.

In the aforementioned embodiment, the network interface driver may further be operable to allocate one or more physical pages of the system memory for each of the incoming TCP sessions.

In the context of various of the aforementioned embodiments, the network interface driver may further be operable to create and maintain the virtual memory maps.

In some instances, the network interface driver may build a session table within the system memory including information regarding a page directory base address for each of the incoming TCP sessions and information regarding an offset to adjust a start address of the payload data with respect to a boundary of the physical page.

In various of the aforementioned embodiments, the virtual address may be determined by calculating an adjusted sequence number based on the sequence number and the offset.

In various instances of the aforementioned embodiments, the network device may be a network security platform.

In the context of various of the aforementioned embodiments, the user process may perform one or more security functions in relation to the payload data.

In the aforementioned embodiment, the one or more security functions may include antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and/or virtual private network (VPN) services.

Other embodiments of the present invention provide a program storage device readable by one or more processors of a network device, tangibly embodying a program of instructions executable by the one or more processors to perform method steps for performing Transmission Control Protocol (TCP) reassembly. In accordance with the method, an incoming TCP packet is initially received at a network interface of the network device. Payload data from the TCP packet is then written to a physical page within a system memory based on a sequence number associated with the incoming TCP packet and by obtaining a physical address from a virtual memory map corresponding to an incoming TCP session with which the TCP packet is associated. After the physical page is full, the physical page is made accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the processor.

In the aforementioned embodiment, the user process may perform one or more security functions in relation to the payload data.

In context of various of the aforementioned embodiments, the one or more security functions may include, but are not limited to, antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and/or virtual private network (VPN) services.

Other features of embodiments of the present invention will be apparent from the accompanying drawings and from the detailed description that follows.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 conceptually illustrates TCP reassembly processing as it is traditionally performed.

FIG. 2 conceptually illustrates accelerated TCP reassembly processing in accordance with various embodiments of the present invention.

FIG. 3 is an example of a system in which embodiments of the present invention may be utilized.

FIGS. 4A and 4B depict exemplary virtual memory mapping mechanisms that may be used in relation to various embodiments of the present invention.

FIG. 5 is a flow diagram illustrating incoming TCP traffic processing in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

Methods and systems are described for network protocol reassembly acceleration. According to one embodiment, hardware is provided for performing TCP payload spitting and stitching with assistance from a CPU, such as the host processor of a network device. A network driver executing on the CPU may allocate multiple memory pages for each incoming TCP session and build mapping tables for use by hardware. The mapping tables (e.g., one for each incoming TCP session) may be indexed based upon the TCP sequence number to obtain a buffer address in physical memory to which incoming TCP packet payloads are to be written. Associated packet headers (including MAC, IP and TCP headers) can be written to system memory in accordance with various traditional approaches for processing by the network driver. Upon receiving a full page of data, the network driver can then remap the page to make it accessible to an appropriate user process using the CPU paging table. In this manner, TCP reassembly acceleration processing as described in accordance with various embodiments of the present invention does not require copying of payload data from kernel space to user space.

For purposes of simplicity, various embodiments of the present invention are described in the context of TCP reassembly acceleration. It is to be noted, however, that the reassembly acceleration processing described herein, for example, may be implemented generically enough so as to be used for offloading reassembly of other network protocols, including transport layer protocols that are fragmented natively or otherwise. Thus, embodiments of the present invention provide techniques that may generally increase inbound throughput of high-bandwidth network connections by reducing CPU overhead.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form.

Embodiments of the present invention include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, the steps may be performed by a combination of hardware, software, firmware and/or by human operators.

Embodiments of the present invention may be provided as a computer program product, which may include a machine-readable medium having stored thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, compact disc read-only memories (CD-ROMs), and magneto-optical disks, ROMs, random access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, embodiments of the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).

Terminology

Brief definitions of terms used throughout this application are given below.

The terms “connected” or “coupled” and related terms are used in an operational sense and are not necessarily limited to a direct connection or coupling.

The phrases “in one embodiment,” “according to one embodiment,” and the like generally mean the particular feature, structure, or characteristic following the phrase is included in at least one embodiment of the present invention, and may be included in more than one embodiment of the present invention. Importantly, such phases do not necessarily refer to the same embodiment.

If the specification states a component or feature “may”, “can”, “could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.

The term “responsive” includes completely or partially responsive.

FIG. 2 conceptually illustrates TCP reassembly acceleration processing in accordance with various embodiments of the present invention. It is to be understood that the present example is being described at an abstract level in connection with a logical view of various functional units, including, hardware and software components. Thus, FIG. 2 does not necessarily represent an actual physical view of the hardware and/or software components depicted or their interconnections.

In contrast to the TCP reassembly processing illustrated by FIG. 1, in the present example, TCP payload data need not be copied from kernel memory space to user memory space.

According to the present example, system memory 230 is shown containing mapping tables 240 and multiple TCP session buffers 250a, 250b and 250n. A network driver 210 and a user process 220 have access to system memory 230. In some embodiments of the present invention, one function of network driver 210 is to allocate one or more physical memory pages in system memory 230 for each TCP session (e.g., TCP session buffers 250a, 250b and 250n). Incoming TCP stream, e.g., incoming packets 275, is reassembled within the corresponding TCP session buffers 250a, 250b and 250n by a TCP payload split and stitching unit 270. Depending upon the particular implementation, user process 220 may perform one or more security functions, including but not limited to antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and virtual private network (VPN) services.

In the current example and as described in more detail below, TCP payload split and stitching unit 270 accelerates inbound TCP traffic processing by splitting the TCP header and payload and reassembling the payload into virtually continuous system memory 230 (e.g., host memory). Thus, for example, the received TCP stream can be reassembled in system memory 230 to form the originally transmitted TCP payload by causing a bus/memory interface 260 to store TCP payload data into the correct portion of the corresponding TCP session buffer based on the TCP sequence number associated with the incoming packet thereby avoiding kernel-user space data copying.

According to one embodiment, mapping tables 240 include virtual memory mapping translation data structures, which will be described further below, that allow interface hardware, e.g., bus/memory interface 260, coupled to the system memory 230 to write TCP payload data directly to physically addressed buffers based on an adjusted TCP sequence number and a page offset.

According to one embodiment, the mapping tables 240 are used only by the interface hardware and a corresponding page table translation data structure is used by the host processor. In other embodiments, the mapping tables 240 and the page table translation data structure are one in the same. In some embodiments, the mapping tables 240 and/or the page table translation data structure mimic the 32-bit Intel Architecture (IA32) page table. In such an embodiment and in the context of an IA32 system, the network driver 210 can re-use the system's native page table structure. In the context of other systems, the network driver 210 may construct the desired page table structure and/or mapping tables 240 from the native page table. As described further below, in some embodiments, the mapping tables 240 do not need to be fully filled thus allowing TCP sessions that are not to be offloaded and other sessions to bypass the reassembly acceleration processing described herein.

Session context data (not shown), which may be initialized upon session creation, may include both (i) information specifying an address of the base of a page directory (e.g., a page directory base register (PDBR) value) in the mapping tables 240 used to translate a virtual address represented by the adjusted TCP sequence number, for example, to a physical address and (ii) a page offset to adjust the packet start address with respect to a page boundary. As described further below, for TCP sessions that are not to be offloaded and other sessions, network driver 210 may create a session context having a predetermined PDBR value, such as all zeroes.

In one embodiment, the functionality of one or more of the above-referenced functional units may be merged in various combinations. For example, the TCP payload split and stitching unit 270 and the bus/memory interface 260 may be combined. Moreover, the various functional units can be communicatively coupled using any suitable communication method (e.g., message passing, parameter passing, and/or signals through one or more communication paths, etc.). Additionally, the functional units can be physically connected according to any suitable interconnection architecture (e.g., fully connected, hypercube, etc.).

According to embodiments of the invention, the functional units can be any suitable type of logic (e.g., digital logic, software code and the like) for executing the operations described herein. For example, all or a portion of the TCP payload split and stitching functionality may be performed by software or firmware executed by a network processor. Any of the functional units used in conjunction with embodiments of the invention can include machine-readable media including instructions for performing operations described herein. Machine-readable media include any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes, but is not limited to, read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media or flash memory devices.

According to one embodiment and as described further below, TCP reassembly acceleration processing is performed on behalf of a network interface of a network device by TCP payload split and stitching unit 270 via bus/memory interface 260 coupled to system memory via an interconnect bus. In one embodiment, the TCP payload split and stitching unit 270 is integrated within a TCP offload engine (TOE) that also performs TCP segmentation offloading (TSO) as described in co-pending and commonly assigned U.S. patent application Ser. No. 12/254,931, which is hereby incorporated by reference in its entirety for all purposes.

Returning to the role of the network driver 210, in various embodiments of the present invention copying payload data from kernel space to user space and the associated delays are eliminated by instead having the network driver 210 remap physical pages, via the CPU paging table, for example, to make them accessible to an appropriate user process, such as user process 220. Thus, as such pages are filled they can be read by the user process 220. In this manner, TCP reassembly acceleration processing as described in accordance with various embodiments of the present invention does not require copying of payload data from kernel space to user space.

FIG. 3 is an example of a network device 300 in which embodiments of the present invention may be utilized. Network device 300 may be any logical or physical device or combination of such devices that perform network traffic reassembly processing, including, but not limited to a network security platform, a network security appliance, a firewall, a network gateway, a client, a server and the like.

In the present example, network device 300 includes a network interface 350, a bus/memory interface 340, an interconnect bus 330, a general purpose processor 310 and a system memory 320. General purpose processor 310 may be any processor that is tailored for executing software commands indicated by an operating system. Thus, for example, general purpose processor 310may be, but is not limited to the various processors currently found in personal computers such as those offered by Intel and AMD. Based on the disclosure provided herein, one of ordinary skill in the art will recognize a variety of general purpose processors that may be used in relation to different embodiments of the present invention. In one embodiment, processor 310 may be implemented as a semiconductor device such as, for example, a programmable ate array or an application specific integrated circuit.

Bus/memory interface 340 provides control for interconnect bus 330 and access to system memory 320. In particular embodiments of the present invention, interconnect bus 330 is a Peripheral Component Interconnect (PCI) bus, system memory 320 is a random access memory 330, and bus/memory interface 340 is a chipset currently available for controlling the PCI bus and providing access to system memory 320. It should be noted that interconnect bus 330 may be, but is not limited to, a PCI interface, a Peripheral Component Interconnect Extended (PCI-X) interface, a Peripheral Component Interconnect Express (PCIe) interface, or a HyperTransport (HT) interface.

As described with reference to FIG. 2, system memory 320 may have stored therein, among other things, one or more virtual memory mapping tables (e.g., page tables and/or mapping tables 240) and TCP session buffers (e.g., TCP session buffers 250a, 250b and 250n). The page tables and/or mapping tables 240 include translation data structures, which will be described further below, that allow interface hardware, e.g., bus/memory interface 340, coupled to the system memory 320 to write payload data to appropriate locations within system memory 320 to facilitate efficient reassembly processing.

Depending upon the particular implementation, network interface 350 may be a network interface unit (NIU), such as a network interface card (NIC), or other network interface device to allow network device 300 to connect to an outside network. In one embodiment, network interface 350 includes a network processor or other digital logic (not shown) to allow it to direct and perform reassembly processing to offload the general purpose processor 310. In one embodiment, TCP reassembly acceleration processing may be performed by the network processor. Alternatively, TCP reassembly acceleration processing may be performed in hardware.

In one embodiment, bus/memory interface 340 is configured to write TCP payload data to system memory 320 via interconnect bus 330 based on virtual memory mapping tables (e.g., mapping tables 240) therein. In one embodiment, the locations of the virtual memory mapping tables corresponding to each TCP session are stored in local memory, such as double data rate synchronous dynamic random access memory (DDR SDRAM), of the network interface 350 or in on-chip caches called translation lookaside buffers or TLBs 345. Advantageously, since the virtual memory mapping tables are in host memory, e.g., system memory 320, and their locations are stored in local memory, as a practical matter, the number of TCP sessions that can be accelerated is not limited by the internal memories associated with the bus/memory interface 340 and/or the network interface 350.

FIGS. 4A-4B depict exemplary virtual memory mapping mechanisms that may be used in relation to various embodiments of the present invention. In particular, FIG. 4A shows a hierarchy of a page directory 410 and a page table 430 utilized when mapping an adjusted TCP sequence number 400 (a virtual address) to exemplary 4-KByte pages 440. The entries in page directory 410, such as directory entry 411, point to page table 430, and the entries in page table 430, such as page-table entry 431, point to pages 440 in physical memory specified by a particular physical address, such as physical address 441.

Based on (i) a base address of the page directory 410, which may be specified by a PDBR value 420 associated with a particular TCP session context as described above, and (ii) a directory field 401, a table field 402 and an offset field 403 of the adjusted TCP sequence number 400, the TCP payload split and stitching functionality implemented by special purpose hardware (e.g., TCP payload split and stitching unit 270) or by network interface 350, for example, may cause the bus/memory interface 340 to reassemble the TCP payload data into virtually continuous host memory (e.g., system memory 230).

A register (not shown) may be used to indicate when an associated general purpose processor has invalidated one or more entries of page directory 410. Where such invalidation occurs, it is up to the network interface 350 and/or the bus/memory interface 340 to refresh the page table by accessing system memory 320.

FIG. 4B shows a process for using a page directory 460 to map an adjusted TCP sequence number 450 (a virtual address) to exemplary 4-MByte pages 480. The entries in page directory 460, such as directory entry 461, point to 4-MByte pages 480 in physical memory.

Based on (i) a base address of the page directory 460, which may be specified by a PDBR value 470 associated with a particular TCP session context as described above, and (ii) a directory field 451 and an offset field 452 of the virtual address 450, the TCP payload split and stitching functionality implemented by special purpose hardware (e.g., TCP payload split and stitching unit 270) or by network interface 350, for example, may cause the bus/memory interface 340 to reassemble the TCP payload data into virtually continuous host memory (e.g., system memory 230).

A register (not shown) may be used to indicate when an associated general purpose processor has invalidated one or more entries of page directory 460. Where such invalidation occurs, it is up to the network interface 350 and/or bus/memory interface 340 to refresh the page table by accessing system memory 320.

According to one embodiment, the PDBR values 420 and 470 may be stored in the following format:

PDBR [31:12] Page Offset [11:0]

This format allows the PDBR address to be page-aligned and only the upper 20 bits are needed to be stored in the session table. The lower 12 bits can then be used as a page offset to adjust the packet start address with respect to the page boundary.

In one embodiment, for TCP sessions that do not want to be offloaded and other sessions, the network driver (e.g., network driver 210) can push a PDBR into the session table having the upper 20 bits set to zero to disable offloading.

According to one embodiment, page directory entries include page table base address as well as an indicator regarding page size (e.g., 4 KB pages vs. 4 MB pages) and a presence indicator (e.g., a present bit).

According to one embodiment, page table entries include a page base address as well as a presence indicator (e.g., a present bit).

In various embodiments, the presence indicator in the page directory entries and the page table entries may be used to implement the TCP receiving window function. For example, only the pages falling within the receiving window may have the present bit set to indicate the corresponding page has been allocated. The present bit associated with the other pages may be unset indicating the corresponding page has not been allocated. In this manner, the bus/memory interface 260, 340 may be prevented from overwriting payload data outside the receiving window.

FIG. 5 is a flow diagram illustrating incoming TCP traffic processing in accordance with an embodiment of the present invention. Depending upon the particular implementation, the various process and decision blocks described herein may be performed by hardware components, embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps, or the steps may be performed by a combination of hardware, software, firmware and/or involvement of human participation/interaction.

In the present example, before reaching decision block 510, it is assumed an incoming TCP packet has been received at a network interface, e.g., network interface 350.

According to the embodiment depicted, processing of the incoming TCP packet begins at decision block 510, in which it is determined if the packet meets various validation checks. For example, among other things, it may be determined if the packet is really a TCP/IP packet and if the IP length specified in the packet matches the actual length of the received packet. If any of the sanity checks fail, then processing branches to block 550; otherwise, if all the sanity checks pass, then processing continues with decision block 520.

At decision block 520, it is determined if reassembly processing is to be offloaded to a TOE, for example. According to one embodiment, this determination is made by evaluating the PDBR value associated with the TCP session to which the packet corresponds. If the PDBR value is equal to a predetermined value that by convention indicates no offloading is to be performed, then processing branches to block 530; otherwise, processing continues with decision block 540.

At block 530, offloading for the packet at issue is disabled and the packet is delivered to the host processor, e.g.,general purpose processor 310, by means other than the TCP reassembly acceleration processing described herein. Those skilled in the art will recognize various conventional means according to which the packet may be delivered to the appropriate user process, e.g., user process 220. After the packet is delivered, processing is complete.

At decision block 540, it is determined if the system memory pages to which the payload data would be stored have been allocated. According to one embodiment and as described above, a presence indicator may be provided in page directory entries and/or page table entries of the virtual memory map. For each of the pages the payload data of the packet will span, the bus/memory interface, e.g., bus/memory interface 260 or 340, checks if the present bit, for example, indicates the page is allocated. If any of the pages to which a portion of the payload data has not been allocated, then processing branches to block 550. If all the pages have been allocated, then processing continues with block 560.

At block 550, either (i) one or more of the sanity checks of decision block 510 failed or (ii) one or more of the system memory pages to which the payload data would be written according to the virtual memory map has not been allocated. Consequently, offloading for the packet at issue is disabled and processing for this packet is complete.

At block 560, the true paths of each of decision blocks 510, 520 and 540 have been followed meaning the sanity checks have passed, the packet is to be offloaded and appropriate system memory pages have been allocated. Thus, at this point, the packet header and payload are separated. The packet header is written to kernel memory for validation by the network driver; and the packet payload is written to the location in system memory indicated by the virtual memory map for the corresponding TCP session.

At decision block 570, a determination is made whether any of the pages to which the payload was written are full. If so, then processing continues to block 580; otherwise processing is complete.

At block 580, one or more memory pages were filled as a result of the most recent storage of payload data to system memory. In order to make the page accessible to the appropriate user process, the one or more memory pages are remapped through the page table to user memory space visible to the appropriate user process. At this point, the user process may operate on the payload data. According to one embodiment, the user process may perform one or more security functions, including, but not limited to antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and virtual private network (VPN) services.

According to one embodiment, TCP reassembly processing is performed by a network processor separate and independent from the general purpose processor and/or dedicated hardware (e.g., TCP payload split and stitching 270) associated with the ingress network interface in an effort to increase system throughput and decrease usage of the general purpose processor. Advantageously, in accordance with embodiments of the present invention, TCP reassembly processing is improved as a result of avoiding kernel-user space copying of TCP payload data.

While embodiments of the invention have been illustrated and described, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the invention, as described in the claims.

Claims

1. A method comprising:

receiving an incoming packet at a network interface;
a memory interface writing payload data from the packet to a physical page within a system memory on behalf of the network interface based on a sequence number associated with the incoming packet and by obtaining a physical address from a virtual memory map corresponding to an incoming session with which the packet is associated; and
after the physical page is full, making the physical page accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the user process.

2. The method of claim 1, wherein the incoming packet comprises a TCP packet and the incoming session comprises an incoming TCP session.

3. The method of claim 2, further comprising a network interface driver being executed by the processor allocating one or more physical pages of the system memory for each of a plurality of incoming TCP sessions, including the incoming TCP session.

4. The method of claim 3, further comprising the network interface driver building virtual memory maps, including the virtual memory map, corresponding to the plurality of incoming TCP sessions.

5. The method of claim 4, further comprising the network interface driver building a session table including information regarding a page directory base address for each of the plurality of incoming TCP sessions and information regarding an offset to adjust a start address of the payload data with respect to a boundary of the physical page.

6. The method of claim 5, further comprising:

calculating an adjusted sequence number based on the sequence number and the offset; and
using the adjusted sequence number as a virtual address input to the virtual memory map.

7. The method of claim 2, further comprising, prior to the memory interface writing payload data from the TCP packet to the physical page, determining whether the physical page has been allocated.

8. A network device comprising:

a processor configured to execute one or more user processes and a network interface driver;
a system memory, coupled to the processor, having stored therein (i) a paging table used by the processor to translate virtual memory addresses into corresponding physical memory addresses and (ii) a plurality of virtual memory maps containing information for use in connection with translating a virtual address input based on a sequence number of an incoming Transmission Control Protocol (TCP) packet to a physical address;
a network interface operable to receive incoming TCP packets;
an interconnect bus coupled to the processor and the system memory; and
a bus/memory interface, coupled to the network interface and the interconnect bus, adapted to write payload data from the incoming TCP packet to a physical page within the system memory on behalf of the network interface based on the sequence number and a virtual memory map of the plurality of virtual memory maps corresponding to an incoming TCP session with which the TCP packet is associated; and
wherein the network interface driver makes the physical page accessible to a user process of the one or more user processes by remapping the physical page through the paging table.

9. The network device of claim 8, wherein the network interface driver is further configured to allocate one or more physical pages of the system memory for each of the plurality of incoming TCP sessions.

10. The network device of claim 9, wherein the network interface driver is further configured to create and maintain the plurality of virtual memory maps.

11. The network device of claim 10, wherein the network interface driver is further configured to build a session table within the system memory including information regarding a page directory base address for each of the plurality of incoming TCP sessions and information regarding an offset to adjust a start address of the payload data with respect to a boundary of the physical page.

12. The network device of claim 11, wherein the virtual address is determined by calculating an adjusted sequence number based on the sequence number and the offset.

13. The network device of claim 8, wherein the network device comprises a network security platform.

14. The network device of claim 8, wherein the user process performs one or more security functions.

15. The network device of claim 14, wherein the one or more security functions include one or more of antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and virtual private network (VPN) services.

16. A program storage device readable by one or more processors of a network device, tangibly embodying a program of instructions executable by the one or more processors to perform method steps for performing Transmission Control Protocol (TCP) reassembly, the method comprising:

receiving an incoming TCP packet at a network interface of the network device;
writing payload data from the TCP packet to a physical page within a system memory based on a sequence number associated with the incoming TCP packet and by obtaining a physical address from a virtual memory map corresponding to an incoming TCP session with which the TCP packet is associated; and
after the physical page is full, making the physical page accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the processor.

17. The program storage device of claim 16, wherein the user process performs one or more security functions.

18. The program storage device of claim 17, wherein the one or more security functions include one or more of antivirus scanning, spam detection, web filtering, firewalling, intrusion detection, intrusion prevention and virtual private network (VPN) services.

Patent History
Publication number: 20090307363
Type: Application
Filed: Oct 22, 2008
Publication Date: Dec 10, 2009
Applicant:
Inventors: Xu Zhou (Milpitas, CA), David Chen (San Jose, CA), Lin Huang (Fremont, CA), Guansong Zhang (Mountain View, CA)
Application Number: 12/255,916
Classifications
Current U.S. Class: Computer-to-computer Protocol Implementing (709/230)
International Classification: G06F 15/16 (20060101);