SESSION-BASED FORWARDING
The present disclosure discloses a method and network device for session based forwarding. Specifically, the disclosed system receives a first packet in a session, and performs a route lookup to determine a route for the first packet. Then, the system caches a reference to the route and a neighbor in the session, and also caches a reference to the session in a tunnel within which packets in the session are to be forwarded. Based on a comparison between the route version number cached in the session and the route version number in a route table corresponding to the route referenced by a route index in the session, the system determines whether the route is stale. If so, the system performs another route lookup to update the route. Moreover, the system uses cached reference to the session in the tunnel for forwarding subsequent packets in the session.
This application claims the benefit of priority on U.S. Provisional Patent Application 61/732,829, filed Dec. 3, 2012, the entire contents of which are incorporated by reference.
Related patent applications to the subject application include the following: (1) U.S. Patent Application entitled “System and Method for Achieving Enhanced Performance with Multiple Networking Central Processing Unit (CPU) Cores” by Janakiraman, et al., U.S. application Ser. No. 13/692,622, filed Dec. 3, 2012, attorney docket reference no. 6259P186; (2) U.S. Patent Application entitled “Ingress Traffic Classification and Prioritization with Dynamic Load Balancing” by Janakiraman, et al., U.S. application Ser. No. 13/692,608, filed Dec. 3, 2012, attorney docket reference no. 6259P191; (3) U.S. Patent Application entitled “Method and System for Maintaining Derived Data Sets” by Gopalasetty, et al., U.S. application Ser. No. 13/692,920, filed Dec. 3, 2012, attorney docket reference no. 6259P192; (4) U.S. Patent Application entitled “System and Method for Message handling in a Network Device” by Palkar, et al., U.S. application Ser. No. ______, filed Jun. 14, 2013, attorney docket reference no. 6259P189; (5) U.S. Patent Application entitled “Rate Limiting Mechanism Based on Device Load/Capacity or Traffic Content” by Nambiar, et al., U.S. application Ser. No. ______ , filed Jun. 14, 2013, attorney docket reference no. 6259P185; (6) U.S. Patent Application entitled “Control Plane Protection for Various Tables Using Storm Prevention Entries” by Janakiraman, et al., U.S. application Ser. No. ______, filed Jun. 14, 2013, attorney docket reference no. 6259P188. The entire contents of the above applications are incorporated herein by reference.
FIELDThe present disclosure relates to networking processing performance of a symmetric multiprocessing (SMP) network architecture. In particular, the present disclosure relates to a system and method for providing session-based forwarding in a pipelined forwarding model.
BACKGROUNDA symmetric multiprocessing (SMP) architecture generally is a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory. In the case of multi-core processors, the SMP architecture can apply to the CPU cores.
In an SMP architecture, multiple networking CPUs or CPU cores can receive and transmit network traffic. While receiving and transmitting the network traffic, the system may maintain a flow-based engine that transmits the network traffic on a per-flow basis. Each flow is uniquely identified by a session key. To allow for efficient forwarding of flow-based network traffic, a network routing system typically uses longest prefix match to perform route lookup. Specifically, the routing system can look up in a route table for next hop information based on which Internet Protocol (IP) address provides the longest prefix match to the destination IP address.
Nevertheless, the longest prefix match lookup may incur excessive cost when the packets have long IP addresses, e.g., in the scenario of IPv6 network. Moreover, because the network topology and conditions can change dynamically, updating the route table to reflect the route changes in the flow table can be costly too. As a result, the rate of network convergence in the event of route changes may be slow with the conventional routing mechanisms that update the route information in the flow table.
The present disclosure may be best understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the present disclosure.
In the following description, several specific details are presented to provide a thorough understanding. While the context of the disclosure is directed to SMP architecture performance enhancement, one skilled in the relevant art will recognize, however, that the concepts and techniques disclosed herein can be practiced without one or more of the specific details, or in combination with other components, etc. In other instances, well-known implementations or operations are not shown or described in details to avoid obscuring aspects of various examples disclosed herein. It should be understood that this disclosure covers all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure.
OverviewEmbodiments of the present disclosure relate to networking processing performance. In particular, the present disclosure relates to a system and method for providing efficient session-based forwarding with multiple networking central processing unit (CPU) cores. Specifically, the system achieves efficient session-based forwarding by maintaining a version associated with each route in a session table and determining whether a route is stale based on the value of the version associated with each route.
According to embodiments of the present disclosure, the conventional route cache table that enumerates all destinations on the shared memory is trimmed down to regular Neighbor table without the need for LPM based Route lookup. The packet forwarding pipeline process is optimized by performing route lookup only once per session flow (assuming that no route changes during the session). The present disclosure allows for caching a reference to a route in the session and caching a reference to the session in a tunnel or a logical interface, and thus not only enhancing the conventional per-packet based route lookup to per-flow based lookup, but also allowing direct access to route information from the tunnel or logical interface.
Specifically, with the solution provided herein, a disclosed network device receives a first packet in a session, and performs a route lookup based on a header of the first packet to determine a route for the first packet. Further, the network device caches a reference to the route in the session such that subsequent packets in the session are routed based on the cached reference in lieu of subsequent route lookups. The reference to the route comprises one or more of a route index (which may additionally include an equal cost multiple path (ECMP) index), a route version number, a neighbor index, and a neighbor index number.
For session based forwarding, the disclosed system compares a route's version number in the session against the version number in a route referred by the index in the session. Likewise, the disclosed system also compares the neighbor entry's version number in the session against the version number in a neighbor referred by the index in the session.
For tunnel based forwarding, the disclosed system can validate reference to the session by comparing the source and destination IP addresses. Specifically, the disclosed system can checks the source and/or destination IP address in the tunnel against the source and/or destination IP address in the session.
If the disclosed network device determines that the route is stale, the disclosed network device can perform another route lookup to update the route with one or more of an updated route index, an updated route version number, an updated neighbor index, and an updated neighbor version number. In some embodiments, however, if the disclosed network device determines that the route is stale and the session is inactive, it will delay route lookup until at least one packet is received in the session.
In some embodiments, at least two paths with identical cost correspond to the route are stored in the route table. Each path is identified by a unique Equal Cost Multiple Path (ECMP) index. When a new ECMP index is added to the route table, a subsequent session uses the path associated with the new ECMP index, but an existing session continues to use an existing path associated with an existing ECMP index.
In some embodiments, when at least two next hop nodes use Virtual Router Redundancy Protocol (VRRP), the route is determined to be stale based on difference between the first neighbor version number cached in the session and the second neighbor version number corresponding to the route in the neighbor table.
In some embodiments, if the route determined to be stale, the disclosed network device performs another route lookup to update the session with an updated route index and an updated route version number. And if the updated route index and the updated route version corresponding to a shorter alternative route than the route, the disclosed network device forwards subsequent packets in the session using the shorter alternative route. In one embodiment, the shorter alternative route is stored in a patricia trie as a child node of a parent node. Specifically, the parent node corresponds to the route; and, a route version number of the route corresponding to the parent node is updated/incremented in response to the child node being inserted in the patricia trie.
In some embodiments, the disclosed network device encapsulates the first packet based on information returned from a bridge lookup prior to encrypting the first packet.
Furthermore, the disclosed network device identifies a network interface that the first packet is to be transmitted on. Then, the disclosed network device sends the first packet to a security engine of the network device to encrypt the first packet, and instructs the security engine to forward encrypted first packet to the identified network interface in lieu of returning the encrypted first packet to a processor within the network device.
Computing EnvironmentController 100 is a hardware device and/or software module that provide network managements, which include but are not limited to, controlling, planning, allocating, deploying, coordinating, and monitoring the resources of a network, network planning, frequency allocation, predetermined traffic routing to support load balancing, cryptographic key distribution authorization, configuration management, fault management, security management, performance management, bandwidth management, route analytics and accounting management, etc.
Moreover, assuming that a number of access points, such as access point 160, are interconnected with network controller 100. Each access points may be interconnected with zero or more client devices via either a wired interface or a wireless interface. In this example, for illustration purposes only, assuming that client 170 is associated with access point 160 via a wireless link. Access points generally refer to a network device that allows wireless clients to connect to a wired network. Access points usually connect to a router via a wired network or can be a part of a router in itself.
Furthermore, controller 100 can be connected to router 120 through zero or more hops in a layer 3 or layer 2 network (such as L2/L3 Network 110). Router 120 can forward traffic to and receive traffic from Internet 140 through gateway 130. Router 160 generally is a network device that forwards data packets between different networks, and thus creating an overlay internetwork. A router is typically connected to two or more data lines from different networks. When a data packet comes in one of the data lines, the router reads the address information in the packet to determine its destination. Then, using information in its routing table or routing policy, the router directs the packet to the next/different network. A data packet is typically forwarded from one router to another router through the Internet until the packet gets to its destination.
Gateway 130 is a network device that passes network traffic from local subnet to devices on other subnets. In the example in
Web servers 150, 155, and 158 are hardware devices and/or software modules that facilitate delivery of web content that can be accessed through Internet 140. For example, web server 150 may be assigned an IP address of 1.1.1.1 and used to host a first Internet website (e.g., www.yahoo.com); web server 155 may be assigned an IP address of 2.2.2.2 and used to host a second Internet website (e.g., www.google.com); and, web server 158 may be assigned an IP address of 3.3.3.3 and used to host a third Internet website (e.g., www.facebook.com).
In packet switching networks, a flow generally refers to a sequence of packets from a source network/client device to a destination network/client device, which may be another host, a multicast group, or a broadcast domain. A flow could consist of all packets in a specific session connection or media stream. Each layer 2 or layer 3 network session can be uniquely identified by a session key, which may be a layer 3 network session key or a layer 2 network session key. A layer 3 network session key generally includes information, such as a source Internet Protocol (IP) address, a destination IP address, a protocol, a layer 4 source port, a layer 4 destination port, etc. Moreover, a layer 2 network session key generally includes a source Media Access Control (MAC) address, a destination MAC address, Ethernet type, etc. The above described session keys are maintained in a session table use for session management.
General ArchitectureControl plane process 210 may be running on one or more CPU or CPU cores, such as CP CPU 1 212, CP CPU 2 214, . . . CP CPU M 218. Furthermore, control plane process 210 typically handles network control or management traffic generated by and/or terminated at network devices as opposed to data traffic generated and/or terminated at client devices.
According to embodiments of the present disclosure, datapath processors 220 include a single exception processing CPU, such as a slowpath (SP) processor (e.g., Exception Processing CPU 230) and multiple forwarding CPU, such as fastpath (FP) processors (e.g., Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N 248). Only forwarding processors are able to receive data packets directly from network interface 250. Exception processing processor, on the other hand, only receives data packets from the forwarding processors.
Lockless shared memory 260 is a flat structure that is shared by all datapath processors 220, and not tied to any particular CPU or CPUs. Any datapath processor 220 can read any memory location within lockless shared memory 260. Therefore, both the single exception processing processor (e.g., Exception Processing CPU 230) and the multiple forwarding processors (e.g., Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N 248) have read access to lockless shared memory 260, but only the single exception processing processor (e.g., Exception Processing CPU 230) has write access to lockless shared memory 260. More specifically, any datapath processor 220 can have access to any location in lockless shared memory 260 in the disclosed system.
Also, control plane process 210 is communicatively coupled to exception processing CPU 230, such as a slowpath (SP) CPU, but not forwarding CPU, such as fastpath (FP) processors (e.g., Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N 248). Thus, whenever control plane process 210 needs information from datapath processors 220, control plane process 210 will communicate with exception processing CPU 230, such as an SP processor.
Network ForwardingPort lookup;
VLAN lookup;
Port-VLAN table lookup;
Bridge table lookup;
Firewall session table lookup;
Route table lookup;
Packet encapsulation;
Packet encryption;
Packet decryption;
Tunnel de-capsulation; and/or
Forwarding; etc.
Thus, the network forwarding process illustrated in
In some embodiments, shared memory 400 is a lockless shared memory. Thus, multiple tables in shared memory 400 can be accessed by multiple FP processors while the FP processors are processing packets received one or more network interfaces. If the FP processor determines that a packet requires any special handlings, the FP processor will hand over the packet processing to the SP processor. For example, the FP processor may find a table entry corresponding to the packet is missed; and therefore, handing over the packet processing to the SP processor. As another example, the FP processor may find that the packet is a fragmented packet, and thus hand over the packet processing to the SP processor.
A. Packet Flows
As mentioned above, a flow generally refers to a sequence of packets from a source network/client device to a destination network/client device, which may be another host, a multicast group, or a broadcast domain. A flow could consist of all packets in a specific session connection or media stream.
Also, each packet may include multiple portions. For example, a packet with a L3 packet key 500 may include at least a network layer (layer 3 or L3) header that includes L3 source IP 510, L3 destination IP 515, and protocol 520, a transport layer (layer 4 or L4) header that includes L4 source port 525 and L4 destination port 530. As another example, a packet with a L2 packet key 550 may include at least a media access control layer (layer 2 or L2) header that include source media access control (MAC) address 560, destination MAC address 570, and Ethernet type 580.
Subsequent fragments include at least a network layer (layer 3 or L3) header, but do not include any transport (layer 4 or L4) header. Transport (layer 4 or L4) header is required for session-based forwarding, for example, when firewall policies need to be applied to the packet. Even though subsequent fragments do not include any transport (layer 4 or L4) header, they are typically applied with the same session policies as those applied to the first segment.
B. Route Cache
Then, the network device inserts an entry in route cache table 600 with the destination IP address, the next hop MAC interface resulting from the route table lookup, and the default gateway MAC address (e.g., MACGW1) that is known to the network device (e.g., a network controller). The inserted entry will also include other information in its corresponding neighbor entry Entry1 from the neighbor table. In the example illustrated in
In some embodiments, route cache table 600 is maintained as a hash table by applying a hash function on the destination IP address of the packet. Route cache table 600 may introduce a few issues. First, the cost of matching longest prefix for each packet can be high. This is especially true in IPv6 network, where the IP addresses become longer than conventional IPv4 addresses. Second, as the number of hashed entries in route cache table increases, the cost for looking up route cache table 600 also increases accordingly. Third, maintaining the consistency of route cache table 600 may result in additional costs.
For example, when the next hop address in a route corresponding to “1.1.0.0” changes in the routing table, the system would have to perform a reverse lookup to search for all destination IP addresses for which “1.1.0.0” is the longest prefix match, and update the next hop address in all of those entries. Therefore, the convergence after a route change is slow because of the costs involved in maintaining the consistency of route cache table 600.
C. Routing Tables
Note that, the version number 100 for route1 is the route version number corresponding to route entry route1 in the route table at the time when the disclosed system performs the route lookup and insert the reference to route1 into session table 720. Likewise, the version number 10 is neighbor ARP1 is the neighbor version number corresponding to the neighbor entry ARP1 in the neighbor table at the time when the disclosed system insert the reference to APR1 into session table 720.
In addition, there exists a firewall session policy table, which includes information, such as permission (e.g., permit or deny access), destination network address translation (DNAT), source network address translation (SNAT), rate limiting, etc. In some embodiments, a flow table can be used for stateful firewall purposes in addition to firewall purposes. According to the present disclosure, the firewall session policy table and/or flow table can be modified to additionally include the next hop information and version information, and used for routing purposes.
During operation, the system monitors every session based on flow-based destination IP address, which persists through the entire session. Because the value of the destination IP address does not change for a particular session, information such as those cached in the route cache can be cached in the session for easier access during the session instead of performing a lookup operation on a packet-by-packet basis. As mentioned previously in description regarding
More specifically, the system will be forwarding packets to “1.1.1.1” in a session (or a flow). Accordingly, the system will initiate a route lookup based on the destination IP address of the session, e.g., “1.1.1.1.” For illustration purposes only, assuming that the route lookup returns a next hop IP address of “10.10.10.10.” The system will then perform a neighbor lookup based on the resulting IP address of the next hop. In this example, it is assumed that the neighbor lookup returns MAC1 (e.g., 01:02:03:04:05:06) and VLAN identifier V10.
It is important to note that the firewall session policies include source and destination IP addresses along with other keys. Therefore, in every session for which the system performs session-based forwarding, if the session is determined to be a router session (which means that the session is not a client-to-client session that the network system can simply forward the packets by bridging the packets, but a session that requires a router to route the packets to the Internet), the system can then cache the routing lookup results in the session itself.
In another example, one router is configured as the default router. Thus, every packet will be sent to the same MAC address corresponding to the default router, but the next hop address will be different depending on the destination IP address of each session. In this example, it is possible to eliminate all user VLANs and to use only one VLAN to route all packets to the default router. In addition, a guest VLAN may be configured to route all guest traffic to a network controller within the WLAN.
Therefore, the disclosed system may optimize the forwarding pipeline by caching the routing information in each session. However, it is possible that during an active session, a route to the destination IP address may change. In such scenarios, the disclosed system will update the session table to reflect the route changes. To improve the efficiency in the updating operations, rather than maintaining a copy of the relevant route information in the session table, the disclosed system maintains a reference to an entry in the route table and a version number corresponding to the route reference, as well as a reference to an entry in the neighbor table and a version number corresponding to the neighbor reference.
With this solution, the disclosed system no longer needs to perform a route lookup after a firewall session lookup, because the system can obtain route information directly from the sessions. Neither is it necessary to maintain a route cache in the system any longer, which reduces the cost for routing operations compared with other solutions.
Maintenance of Route ConsistencyThe disclosed system is also able to efficiently maintain the consistency of the route information in the session table (or flow table) and the route information in the route table and the neighbor table (or ARP table). Because the reference and version number to each route is maintained in the session, any time when a route changes, the system will be able to quickly detect the route change based on one or more of a change in the route reference, route version, neighbor reference, and/or neighbor version. Moreover, the system can easily update the session with the route change by updating one or more of the route reference, route version, neighbor reference, and/or neighbor version to reflect the route change.
Specifically, to detect a route change, for every packet being forwarded in a session, the system de-references the route index and the neighbor index in the session entry to retrieve the corresponding entries in the route table and the neighbor table. The system then obtains the current version number corresponding to the route index in the route table and the current version number corresponding to the neighbor index in the neighbor table. Next, the system determines whether certain conditions that indicate that the route in the session table is stale have occurred. For example, the system can determine whether the route version number and the neighbor version number maintained in the session entry match the current version numbers and/or neighbor index obtained above. If either version number in the session table is different from its corresponding version number in the route table or the neighbor table or if the neighbor index in the session table is different from its corresponding neighbor index in the neighbor table, then the session entry is stable. Therefore, the system will perform a route lookup using longest prefix match of the destination IP address of the session, and update the route information based on the results from the route lookup.
Note that, in the route table, a version number corresponding to a route index changes whenever there is a change in the route, e.g., a change in the next hop address. On the other hand, in the neighbor table, a version number corresponding to a neighbor index changes whenever the mapping between the IP address and the MAC address of a network node (e.g., a default gateway or a next hop for a particular VLAN) changes, for example, when the Ethernet interface changes.
The following sections describe a few exemplary scenarios in which a route could be changed during an active session, and how the system will detect the route change in each scenario. These examples are provided for illustration only. They are not intended to be an exhaustive list of all possible scenarios. One skilled in the art can apply the techniques disclosed herein to detect other types of route changes without departing from the spirit of the invention.
A. Route Change
In this scenario, the default gateway's IP address may change during a session because the route gets modified, for example, from “10.10.10.10” at time point t1 to “18.18.18.18” at time point t2. In the route table, assuming that, at t1, the route entry has the version number of 100, index value of 1, prefix/length value of “1.0.0.0/8,” and next hop IP address of “10.10.10.10.”
Accordingly, in the session table, at time point t1, the session entry has values as shown in session entry 760 in session table 720, e.g., having a source IP address of “100.100.100.100,” a destination IP address of “1.1.1.1,” a route index corresponding to route1, a route version of 100, a neighbor index corresponding to ARP1, and a neighbor version of 10.
Based on the change above, at time point t2, the same route entry has the version number of 101, index value of 1, prefix/length value of “1.0.0.0/8,” and next hop IP address of “18.18.18.18.”
When the system receives the first packet in the session after the time point t2, the system will detect that the route version in the session table (e.g., 100) does not match the route version in the route table (e.g., 101). Thus, the system will deem the session entry as stale and perform a route lookup to update the session entry.
After the update, the session entry will have the values as shown in session entry 765 in session table 720, e.g., having a source IP address of “100.100.100.100,” a destination IP address of “1.1.1.1,” a route index corresponding to route1, a route version of 101, a neighbor index corresponding to ARP2, and a neighbor version of 20. Note that, the route version has changed from 100 to 101; the neighbor index has changed from ARP1 (corresponding to “10.10.10.10” in neighbor table 760) to ARP2 (corresponding to “18.18.18.18” in neighbor table 760); and, the neighbor version has changed from 10 (corresponding to APR1 in neighbor table 760) to 20 (corresponding to ARP2 in neighbor table 760).
Similarly, when a route is deleted rather than modified, the system will also detect a mismatch in the version numbers between the session table and the route table, because the corresponding route entry in the route table is missing. Therefore, the system will initiate a route lookup to search for a new route to the destination IP address based on the longest prefix match to the current route table (in which the original route was deleted), and update the session table with the information regarding the new route.
Also, note that, the route lookup is performed when the system receives a packet in the session after time point t2. Thus, if no packet is received which indicates that the client is in an idle state, then the session entry will remain to be stale despite that a change has occurred in the route table at time point t2. This is because when the client is idle, the client is not using the route, and thus there is no need to spend any resources on updating the route for an idle client. Furthermore, if the session entry remains to be stale for an excessive amount of time because no client is using the session, the session entry will eventually be removed without ever being updated with the route change at all.
B. Equal Cost Multiple Paths (ECMP)
For illustration purposes, assuming that, next hop A 810 is associated with the IP address of “10.10.10.10” and is pre-existing at time point t1. Thus, at time point t1, in the route table, the route entry has the version number of 101, index value of 1, prefix/length value of “1.0.0.0/8,” and next hop IP address of “10.10.10.10” corresponding to next hop A 810.
In addition, assuming that, at time point t2, a new route with equal cost from the source address to the destination address through next hop B 820 becomes available during the session. Assuming that, next hop B 820 is associated with the IP address of “18.18.18.18.”
Accordingly, the route table will be updated. Thus, the same route entry now has the version number of 101, index value of 1, prefix/length value of “1.0.0.0/8,” and next hop IP address of “10.10.10.10; 18.18.18.18” corresponding to both next hop A 810 and next hop B 820. In this example, each next hop IP address corresponds to a unique ECMP index. In addition to caching the route index, the session may also cache the ECMP index in the cases involving ECMPs. Note that, the version number remains the same, but the ECMP index is increased with the additional ECMP route becoming available.
Subsequently, any new sessions to a destination address for which “1.0.0.0/8” provides the longest prefix match will be using the new ECMP route corresponding to next hop B 820 with the IP address of “18.18.18.18.” Nevertheless, any existing route will continue to use next hop A 810 with the IP address of “10.10.10.10,” because the route version number has not changed. As a result, the system will determine that the route through next hop A 810 is not stale and does not need to be changed or updated.
This scheme can be particularly useful when traffic from a private IP network is to be forwarded to two or more networks corresponding to different uplink service providers (e.g., AT&T® and Verizon®) via two or more different IP addresses, such as IP1 and IP2. It is desirable that the traffic to the first service provider is only transmitted via IP1, and the traffic to the second service provider is only transmitted via IP2. Therefore, the traffic from users of different service providers will not get mixed with each other.
C. Virtual Router Redundancy Protocol (VRRP)
Virtual Router Redundancy Protocol (VRRP) is a computer networking protocol that provides for automatic assignment of available IP routers to participating hosts. This increases the availability and reliability of routing paths via automatic default gateway selections on an IP sub-network. The VRRP protocol achieves this by creation of virtual routers, which are an abstract representation of multiple routers, i.e., master and backup routers, acting as a group. The default gateway of a participating host is assigned to the virtual router instead of a physical router. If the physical router that is routing packets on behalf of the virtual router fails, another physical router is selected to automatically replace it. The physical router that is forwarding packets at any given time is called the master router. Thus, at any given time, there is only one physical router that is actively forwarding the traffic.
Assuming that, at time point t1, next hop A 810 is active under VRRP 815 and corresponds to the IP address of “10.10.10.10.” Thus, in the neighbor table, the neighbor entry has the version number of 10, index value of 1, IP address of “10.10.10.10,” MAC address of MACA (corresponding to next hop A 810), and VLAN identifier value of V10.
At time point t2, assuming that next hop A 810 fails, and next hop B 820 starts to function as the virtual router. Therefore, in the neighbor table, the same neighbor entry now has the version number of 11, index value of 1, IP address of “10.10.10.10,” MAC address of MACB (corresponding to next hop B 820), and VLAN identifier value of V10.
Because the neighbor version has changed from 10 to 11, the system will determine that the route has become stale. Consequently, the system will perform a route lookup and update the session entry in the session table with the new neighbor version number.
D. Shorter Alternative Route
In some embodiments, route information can be maintained in a patricia trie. A patricia trie generally refers a space-optimized trie data structure, where each node with only one child is merged with its child. As a result, every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.
Furthermore, when a new route is inserted into a patricia trie as a new node (e.g., assuming that node 960 is inserted as a child node of node 940), the disclosed system performs at least two operations: First, the system will add the new route (e.g., “1.1.0.0/16”) to the route table with a new route index that is different from the route index of the original route corresponding to the parent node in the patricia trie. Second, the system will increase, in the route table, the version number of the route corresponding to the parent node of the inserted node (e.g., the version number of route “1.0.0.0/8” will be increased from 100 to 101; note that the route “1.0.0.0/8” corresponds to parent node 940 of the inserted node 960 in this example).
Because the version number of the route corresponding to the parent node in the patricia trie gets updated, the corresponding route entry (e.g., “1.0.0.0/8”) becomes stale due to the difference in the route version number in the session table (e.g., 100) and in the route table (e.g., 101). Thus, the system will perform a route lookup to update the route information. As a result, the route lookup will return the shorter route, e.g., “1.1.0.0/16”, with a new route index instead of the original route (e.g., “1.0.0.0/8”). Thus, subsequent traffic from the same source node to the same destination node will be forwarded through the updated shorter route.
Note that, as mentioned above, when traffic from a private IP network is to be forwarded to two or more networks corresponding to different uplink service providers (e.g., AT&T® and Verizon®) via two or more different IP addresses, such as IP1 and IP2, it is desirable that the traffic to the first service provider is only transmitted via IP1, and the traffic to the second service provider is only transmitted via IP2. In such scenarios, typically a network address translation (NAT) of either source IP address or destination IP address of the packets is involved. Nevertheless, when no NAT operation is involved and a shorter route is found during a session as in the example illustrated above in the description of
Note that, the route lookup is performed only when the system receives a packet in a session. Thus, if in a second session where the new route “1.1.0.0/16” can be used, but no packet is received in the second session due to the client being idle, then the session entry will remain to be stale despite that a new and shorter route has been inserted in the route table. This is because when the client is idle, the client is not using the route, and thus there is no need to spend any resources on updating the route for an idle client. Furthermore, if the session entry remains to be stale for an excessive amount of time because no client is using the session, the session entry will eventually be removed without ever being updated with the route change at all.
It is important to note that, in the present disclosure, only active purging is used, and there is not background purging involved. Therefore, a session is updated only when there are active traffic activities in the session. No background process is used to update the session entries, because there is no need to utilize the resource for a session when the session is idle.
Caching Session Information in Secured TunnelsIn some embodiments, a computing environment as illustrated in
As illustrated in
In a system as illustrated in
In some embodiments, datapath processors 220 may perform encapsulation prior to sending the packet to the security engine for encryption. In addition, datapath processors 220 may instruct the security engine which destination network interface 250 is associated with the packet. Therefore, after the security engine completes the encryption of the packet, the security engine can directly forward the packet to its corresponding destination network interface 250 without returning the encapsulated packet to datapath processors 220.
Specifically, the system will first perform a route and neighbor (e.g., ARP) lookup, which will return a MAC address and a VLAN identifier corresponding to the destination IP address in the packet. Next, based on the combination of MAC and VLAN identifier, the system performs a bridge lookup, which will return a destination network interface that can be either a port identifier or a tunnel identifier.
Based on the MAC address corresponding to the destination IP address, the system can determine whether the packet is a unicast packet or a multicast packet. If the packet is a unicast packet, the system can use a unicast key to encrypt the packet. On the other hand, if the packet is a multicast packet, the system can use a tunnel or multicast key to encrypt the packet.
Furthermore, based on the destination network interface, the system can determine whether the packet needs to encapsulated. For example, if the destination network interface returned from the bridge lookup is associated with a GRE tunnel, then the system can determine that the packet will need to be encapsulated with the GRE headers before being forwarded to its destination. Typically, in order to perform an encapsulation, the system needs to know the tunnel information for the packet, which includes the source and destination IP addresses (available in the header of the packet), the transmission protocol (which can be determined based on the tunnel identifier), and L4 attributes associated with the packet (which are usually cached in the tunnel). Therefore, upon successful bridge lookup, the system would be able to perform an encapsulation of the packet based on the information returned from the bridge lookup.
Note that, if the system identifies that a packet needs to be encrypted and encapsulated, the system can perform the encapsulation prior to the encryption, and thereby avoiding the need for the packet to be returned to datapath processors after encryption. This simplified packet flow within the system, e.g., from the FP processors to security engine directly to network interface without the packet being returned to the FP processors by the security engine, allows for dramatic performance enhancement in a high performance controlling and switching system.
After session entries are cached in a tunnel, the system can combine the tunnel encapsulation operations with the L2/L3 lookups (such as, firewall session lookup 320, route lookup 325, forwarding lookup 330, etc.), and thereby avoid feeding the network forwarding process twice with the same packet (but differently formatted as IEEE 802.3 for the first time and IEEE 802.11 for the second time). In one embodiment, a link from the tunnel to the session (e.g., index value of the corresponding session entry in the session table) is maintained where the session further includes routing information as described above. The link will provide quick access to important routing information stored in session, and therefore allowing for determination of whether the system can leverage the session information for simplified session-based forwarding (e.g., where no complex firewall operations are required for the session).
Furthermore, for subsequent packets within the same session, the system can use the cached link to the session entry to retrieve the routing information, and thus avoiding feeding the packets through the session forwarding pipeline process. In summary, rather than feeding every packet through the session forwarding pipeline twice (first with IEEE 802.3 format and second with IEEE 802.11 format), the present disclosure allows for the first packet in a flow to be sent through the session forwarding pipeline once whereby the encryption and encapsulation operations are combined into the pipeline process, and for any subsequent packets to bypass the session forwarding pipeline by providing a direct link from tunnel to the corresponding session entry, which caches the corresponding routing information returned from the route lookup performed for the first packet in the flow.
Processes for Session-Based ForwardingMoreover, the disclosed system compares a first route version number cached in the session with a second route version number cached in a route (operation 1020), and then determines whether the route is stale (operation 1025). In some embodiments, the system further compares a first neighbor index and version number cached in the session with a second neighbor index and version number corresponding to the route in a neighbor table, and determines that the route is stale if the first neighbor index or version number is different from the second neighbor index or version number.
If the system determines that the route is stale, the system will perform another route lookup to update the route (operation 1030). Specifically, the system may update the route with one or more of an updated route index, an updated route version number, an updated neighbor index, and an updated neighbor index number. Nevertheless, in some embodiments, if the system determines that the route is stale but the session is inactive, the system will delay route lookup until at least one packet is received in the session.
In some embodiments, at least two paths with identical cost corresponding to the route are stored in the route table; and, each path is identified by a unique Equal Cost Multiple Path (ECMP) index. When a new ECMP index is added to the route table, a subsequent session uses the path associated with the new ECMP index, but an existing session continues to use an existing path associated with an existing ECMP index.
In some embodiments, at least two next hop nodes use Virtual Router Redundancy Protocol (VRRP), the route is determined to be stale based on the difference between a first neighbor version number cached in the session and a second neighbor version number corresponding to the route in the neighbor table.
Next, when tunnel-based forwarding mechanism is used, the system can use the cached reference to the session in the tunnel for forwarding subsequent packets in the session (operation 1035). Thus, the system only needs to perform a route lookup for the first packet in a session unless there are route changes during the session that prompts for another route lookup to update the route.
In other embodiments, when a route is determined to be stale, the system performs another route lookup to update the session with an updated route index and an updated route version number. Such updated route index and updated route version number may correspond to a shorter alternative route than the original route. If so, the system will forward subsequent packets in the session using the shorter alternative route. In one embodiment, the shorter alternative route is stored in a patricia trie as a child node of a parent node. Specifically, the parent node corresponds to the route; and, a route version number corresponding to the parent node is increased when a child node is inserted in the patricia trie.
Network interface 1110 can be any communication interface, which includes but is not limited to, a modem, token ring interface, Ethernet interface, wireless IEEE 802.11 interface (e.g., IEEE 802.11n, IEEE 802.11ac, etc.), cellular wireless interface, satellite transmission interface, or any other interface for coupling network devices. In some embodiments, network interface 1110 may be software-defined and programmable, for example, via an Application Programming Interface (API), and thus allowing for remote control of the network device 1100.
Shared memory 1120 can include storage components, such as, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), etc. In some embodiments, shared memory 1120 is a flat structure that is shared by all datapath processors (including, e.g., exception processing processor core 1130, forwarding processor core 1142, forwarding processor core 1144, . . . , forwarding processor core 1148, etc.), and not tied to any particular CPU or CPU cores. Any datapath processor can read any memory location within shared memory 1120. Shared memory 1120 can be used to store various tables to assist session-based packet forwarding. For example, the tables may include, but are not limited to, a bridge table, a session table, a user table, a station table, a tunnel table, a route table and/or route cache, etc. It is important to note that there is no locking mechanism associated with shared memory 1120. Any datapath processor can have access to any location in lockless shared memory in network device 1100.
Exception processing processor core 1130 typically includes a networking processor core that is capable of processing network data traffic. Exception processing processor core 1130 is a single dedicated CPU core that typically handles table managements. Note that, slowpath processor core 1130 only receives data packets from one or more forwarding processor cores, such as forwarding processor core 1142, forwarding processor core 1144, . . . , forwarding processor core 1148. In other words, exception processing processor core 1130 does not receive data packets directly from any line cards or network interfaces. Only the plurality of forwarding processor cores can send data packets to exception processing processor core 1130. Moreover, exception processing processor core 1130 is the only processor core having the write access to shared memory 1120, and thereby will not cause any data integrity issues even without a locking mechanism in place for shared memory 1120.
Forwarding processor cores 1142-1148 also include networking processor cores that are capable of processing network data traffic. However, by definition, forwarding processor cores 1142-1148 only performs “fast” packet processing. Thus, forwarding processor cores 1142-1149 do not block themselves and wait for other components or modules during the processing of network packets. Any packets requiring special handling or wait by a processor core will be handed over by forwarding processor cores 1142-1148 to exception processing processor core 1130.
Each of forwarding processor cores 1142-1148 maintains one or more counters. The counters are defined as a regular data type, for example, unsigned integer, unsigned long long, etc., in lieu of an atomic data type. When a forwarding processor core 1142-1148 receives a packet, it may increment or decrement the values of the counters to reflect network traffic information, including but not limited to, the number of received frames, the number of received bytes, error conditions and/or error counts, etc. A typical pipeline process at forwarding processor cores 1142-1148 includes one or more of: port lookup; VLAN lookup; port-VLAN table lookup; bridge table lookup; firewall session table lookup; route table lookup; packet encapsulation; packet encryption; packet decryption; tunnel de-capsulation; forwarding; etc.
Moreover, forwarding processor cores 1142-1148 each can maintain a fragment table. Upon receiving a data fragment without information necessary for session processing (e.g., a transport layer or L4 header), forwarding processor cores 1142-1148 will queue the data fragments in their own fragment table, and perform various fragment table management tasks.
Periodically, exception processing processor core 1130 may receive a query corresponding to one or more forwarding processor cores 1142-1148 from a control plane process. Exception processing processor core 1130 identifies one or more memory locations in the shared memory storing data for the one or more forwarding processor cores 1142-1148 corresponding to the query, retrieves one or more data values at the identified memory locations, and responds to the query. In some embodiments, exception processing processor core 1130 can further aggregate retrieved data values to generate an aggregated data value, and respond to the query based on the aggregated data value.
According to embodiments of the present disclosure, network services provided by network device 1100, solely or in combination with other wireless network devices, include, but are not limited to, an Institute of Electrical and Electronics Engineers (IEEE) 802.1x authentication to an internal and/or external Remote Authentication Dial-In User Service (RADIUS) server; an MAC authentication to an internal and/or external RADIUS server; a built-in Dynamic Host Configuration Protocol (DHCP) service to assign wireless client devices IP addresses; an internal secured management interface; Layer-3 forwarding; Network Address Translation (NAT) service between the wireless network and a wired network coupled to the network device; an internal and/or external captive portal; an external management system for managing the network devices in the wireless network; etc.
The present disclosure may be realized in hardware, software, or a combination of hardware and software. The present disclosure may be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems coupled to a network. A typical combination of hardware and software may be an access point with a computer program that, when being loaded and executed, controls the device such that it carries out the methods described herein.
The present disclosure also may be embedded in non-transitory fashion in a computer-readable storage medium (e.g., a programmable circuit; a semiconductor memory such as a volatile memory such as random access memory “RAM,” or non-volatile memory such as read-only memory, power-backed RAM, flash memory, phase-change memory or the like; a hard disk drive; an optical disc drive; or any connector for receiving a portable memory device such as a Universal Serial Bus “USB” flash drive), which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
As used herein, “digital device” generally includes a device that is adapted to transmit and/or receive signaling and to process information within such signaling such as a station (e.g., any data processing equipment such as a computer, cellular phone, personal digital assistant, tablet devices, etc.), an access point, data transfer devices (such as network switches, routers, controllers, etc.) or the like.
As used herein, “access point” (AP) generally refers to receiving points for any known or convenient wireless access technology which may later become known. Specifically, the term AP is not intended to be limited to IEEE 802.11-based APs. APs generally function as an electronic device that is adapted to allow wireless devices to connect to a wired network via various communications standards.
As used herein, the term “interconnect” or used descriptively as “interconnected” is generally defined as a communication pathway established over an information-carrying medium. The “interconnect” may be a wired interconnect, wherein the medium is a physical medium (e.g., electrical wire, optical fiber, cable, bus traces, etc.), a wireless interconnect (e.g., air in combination with wireless signaling technology) or a combination of these technologies.
As used herein, “information” is generally defined as data, address, control, management (e.g., statistics) or any combination thereof. For transmission, information may be transmitted as a message, namely a collection of bits in a predetermined format. One type of message, namely a wireless message, includes a header and payload data having a predetermined number of bits of information. The wireless message may be placed in a format as one or more packets, frames or cells.
As used herein, “wireless local area network” (WLAN) generally refers to a communications network links two or more devices using some wireless distribution method (for example, spread-spectrum or orthogonal frequency-division multiplexing radio), and usually providing a connection through an access point to the Internet; and thus, providing users with the mobility to move around within a local coverage area and still stay connected to the network.
As used herein, the term “mechanism” generally refers to a component of a system or device to serve one or more functions, including but not limited to, software components, electronic components, electrical components, mechanical components, electro-mechanical components, etc.
As used herein, the term “embodiment” generally refers an embodiment that serves to illustrate by way of example but not limitation.
It will be appreciated to those skilled in the art that the preceding examples and embodiments are exemplary and not limiting to the scope of the present disclosure. It is intended that all permutations, enhancements, equivalents, and improvements thereto that are apparent to those skilled in the art upon a reading of the specification and a study of the drawings are included within the true spirit and scope of the present disclosure. It is therefore intended that the following appended claims include all such modifications, permutations and equivalents as fall within the true spirit and scope of the present disclosure.
While the present disclosure has been described in terms of various embodiments, the present disclosure should not be limited to only those embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. Likewise, where a reference to a standard is made in the present disclosure, the reference is generally made to the current version of the standard as applicable to the disclosed technology area. However, the described embodiments may be practiced under subsequent development of the standard within the spirit and scope of the description and appended claims. The description is thus to be regarded as illustrative rather than limiting.
Claims
1. A method comprising:
- receiving, by a network device, a first packet in a session;
- performing, by the network device, a route lookup based on a header of the first packet to determine a route for the first packet; and
- caching, by the network device, a reference to the route and a neighbor in the session such that subsequent packets in the session are routed based on the cached reference in lieu of subsequent route lookups.
2. The method of claim 1, wherein the reference to the route comprises one or more of: a route index, a route version number, a neighbor index, and a neighbor index number.
3. The method of claim 1, further comprising:
- comparing, by the network device, a first route version number cached in the session and a second route version number in a route table corresponding to the route referenced by a route index in the session; and
- determining, by the network device, that the route is stale in response to the first route version number being different from the second route version number.
4. The method of claim 3, further comprising:
- comparing, by the network device, a first neighbor index and version number cached in the session with a second neighbor index and version number in a neighbor table corresponding to the route referenced by the route index in the session; and
- determining, by the network device, that the route is stale in response to the first neighbor index or version number being different from the second neighbor index or version number.
5. The method of claim 4, further comprising:
- in response to determining that the route is stale, performing another route lookup to update the route with one or more of an updated route index, an updated route version number, an updated neighbor index, and an updated neighbor version number.
6. The method of claim 4, further comprising:
- in response to determining that the route is stale and the session is inactive, delaying route lookup until at least one packet is received in the session.
7. The method of claim 3, wherein at least two paths with identical cost corresponding to the route are stored in the route table, each path being identified by a unique Equal Cost Multiple Path (ECMP) index.
8. The method of claim 7, wherein, when a new ECMP index is added to the route table, a subsequent session uses the path associated with the new ECMP index and an existing session continues to use an existing path associated with an existing ECMP index.
9. The method of claim 4, wherein, when at least two next hop nodes use Virtual Router Redundancy Protocol (VRRP), the route is determined to be stale based on difference between the first neighbor version number cached in the session and the second neighbor version number corresponding to the route in the neighbor table.
10. The method of claim 3, further comprising:
- in response to the route determined to be stale, performing another route lookup to update the session with an updated route index and an updated route version number;
- in response to the updated route index and the updated route version corresponding to a shorter alternative route than the route, forwarding subsequent packets in the session using the shorter alternative route.
11. The method of claim 10, wherein the shorter alternative route is stored in a patricia trie as a child node of a parent node, wherein the parent node corresponds to the route, and wherein a route version number of the route corresponding to the parent node is increased in response to the child node being inserted in the patricia trie.
12. The method of claim 1, further comprising:
- caching, by the network device, a reference to the session in a tunnel within which packets in the session are to be forwarded, thereby allowing direct access to the route from the tunnel.
13. The method of claim 1, further comprising:
- encapsulating, by the network device, the first packet based on information returned from a bridge lookup prior to encrypting the first packet;
- identifying, by the network device, a network interface that the first packet is to be transmitted on;
- sending the first packet to a security engine of the network device to encrypt the first packet; and
- instructing the security engine to forward encrypted first packet to the identified network interface in lieu of returning the encrypted first packet to a processor within the network device.
14. A network device having a symmetric multiprocessing architecture, the network device comprising:
- a plurality of CPU cores;
- a network interface to receive one or more data packets; and
- a memory whose access is shared by the dedicated CPU core and the plurality of CPU cores;
- wherein the plurality of CPU cores are to: receive a first packet in a session; perform a route lookup based on a header of the first packet to determine a route for the first packet; and cache a reference to the route and a neighbor in the session such that subsequent packets in the session are routed based on the cached reference in lieu of subsequent route lookups.
15. The network device of claim 14, wherein the reference to the route comprises one or more of: a route index, a route version number, a neighbor index, and a neighbor index number.
16. The network device of claim 14, wherein the plurality of CPU cores are further to:
- compare a first route version number cached in the session and a second route version number in a route table corresponding to the route referenced by a route index in the session; and
- determine that the route is stale in response to the first route version number being different from the second route version number.
17. The method of claim 16, wherein the plurality of CPU cores are further to:
- compare a first neighbor index and version number cached in the session with a second neighbor index and version number in a neighbor table corresponding to the route referenced by the route index in the session; and
- determine that the route is stale in response to the first neighbor index or version number being different from the second neighbor index or version number.
18. The network device of claim 17, wherein the plurality of CPU cores are further to:
- perform another route lookup to update the route with one or more of an updated route index, an updated route version number, an updated neighbor index, and an updated neighbor version number in response to determining that the route is stale.
19. The network device of claim 17, wherein the plurality of CPU cores are further to:
- delay route lookup until at least one packet is received in the session in response to determining that the route is stale and the session is inactive.
20. The network device of claim 16, wherein at least two paths with identical costs corresponding to the route are stored in the route table, each path being identified by a unique Equal Cost Multiple Path (ECMP) index.
21. The network device of claim 20, wherein, when a new ECMP index is added to the route table, a subsequent session uses the path associated with the new ECMP index and an existing session continues to use an existing path associated with an existing ECMP index.
22. The network device of claim 17, wherein, when at least two next hop nodes use Virtual Router Redundancy Protocol (VRRP), the route is determined to be stale based on difference between the first neighbor version number cached in the session and the second neighbor version number corresponding to the route in the neighbor table.
23. The network device of claim 16, wherein the plurality of CPU cores further to:
- perform another route lookup to update the session with an updated route index and an updated route version number in response to the route determined to be stale;
- forward subsequent packets in the session using the shorter alternative route in response to the updated route index and the updated route version corresponding to a shorter alternative route than the route.
24. The network device of claim 23, wherein the shorter alternative route is stored in a patricia trie as a child node of a parent node, wherein the parent node corresponds to the route, and wherein a route version number of the route corresponding to the parent node is increased in response to the child node being inserted in the patricia trie.
25. The network device of claim 14, wherein the plurality of CPU cores are further to:
- cache a reference to the session in a tunnel within which packets in the session are to be forwarded, thereby allowing direct access to the route from the tunnel.
26. The network device of claim 14, wherein the plurality of the CPU cores are further to:
- encapsulate the first packet based on information returned from a bridge lookup prior to encrypting the first packet;
- identify a network interface that the first packet is to be transmitted on;
- send the first packet to a security engine of the network device to encrypt the first packet; and
- instruct the security engine to forward encrypted first packet to the identified network interface in lieu of returning the encrypted first packet to a processor within the network device.
27. A non-transitory computer-readable storage medium storing embedded instructions for a plurality of operations that are executed by one or more mechanisms implemented within a network device having a symmetric multiprocessing architecture, the plurality of operations comprising:
- receiving a first packet in a session;
- performing a route lookup to determine a route for the first packet;
- caching a reference to the route in the session;
- caching a reference to the session and a neighbor in a tunnel within which packets in the session are forwarded;
- comparing a first route version number cached in the session with a second route version number in a route table corresponding to the route referenced by a route index in the session;
- determining whether the route is stale based on the first and second route version numbers;
- performing another route lookup to update the route in response to determining that the route is stale; and
- using cached reference to the session in the tunnel for forwarding subsequent packets in the session.
Type: Application
Filed: Jun 14, 2013
Publication Date: Jun 5, 2014
Inventors: Ramsundar Janakiraman (Sunnyvale, CA), Ravinder Verma (San Jose, CA), Bhanu S. Gopalasetty (San Ramon, CA)
Application Number: 13/918,748
International Classification: H04L 12/745 (20060101);