TUNNEL-LESS SD-WAN

Info

Publication number: 20230179521
Type: Application
Filed: Jan 28, 2023
Publication Date: Jun 8, 2023
Inventors: Alex Markuze (Rosh HaAyin), Chen Dar (Magshimim), Aran Bergman (Givatayim), Igor Golikov (Kfar Saba), Israel Cidon (San Francisco, CA), Eyal Zohar (Shimshit)
Application Number: 18/102,689

Abstract

In a novel tunnel-less SD-WAN, when an ingress node of the SD-WAN receives a new packet flow, it identifies the path of the flow through the SD-WAN, and sends an initial prepended set of SD-WAN header values before the first packet for the flow to the next hop along this identified path, rather than encapsulating each packet of the flow with encapsulating tunnel headers that store SD-WAN next hop data for the flow. The prepended set of SD-WAN header values are then used to not only forward the first packet through the SD-WAN, but also to create records at each subsequent hop, which are then used to forward subsequent packets of the flow through the SD-WAN. Instead of identifying the entire packet flow, the first hop in the SD-WAN does not identify the entire path for the packet flow in some embodiments, but just identifies the next hop, as each subsequent hop in the SD-WAN has the task of identifying the next hop through the SD-WAN for the packet flow. Also, in some embodiments, each hop also creates records for the reverse flow in order to automatically forward reply packets along a reverse route.

Description

Description

BACKGROUND

In the field of network computing, a wide area network (WAN) system allows companies to incorporate separate local area networks (LANs) as a single effective network. Software-defined wide area networking (SD-WAN) systems are a way of operating such WANs that reduces various network problems such as variations in packet delay, network congestion, and packet loss. SD-WAN systems send data packets (e.g., TCP packets) through managed forwarding nodes (sometimes referred to herein as “nodes” or “MFNs”) of an SD-WAN. The packets are sent from the original source address of the packet to the final destination address through a series of nodes of the SD-WAN.

Some existing SD-WAN systems use IP tunnels. Each network site is provided with an SD-WAN device connected to the LAN. Data packets from one network site to another are sent to the SD-WAN device and encapsulated before being sent to an SD-WAN device of another network site through the nodes. In some existing systems, the encapsulation includes adding additional header to each packet of a packet flow at each node. The headers successively direct the packets to the next node in a path from the original source of the packet to a final destination of the packet. The headers include an inner header with an original source and final destination of the data packet that is prepended when the packet is initially sent and an outer header that includes an address for the next hop of the packet. In such systems, the outer packet is replaced at each hop with a packet identifying a subsequent hop for the packet. Other systems may group packets together and encrypt them. However, such systems may be inefficient as they require every packet to have an outer header removed, analyzed, and replaced with a new header at each successive node. Accordingly, there is a need for more efficient tunnel-less SD-WAN system.

BRIEF SUMMARY

In a novel tunnel-less SD-WAN, when an ingress node of the SD-WAN receives a new packet flow, it identifies the path of the flow through the SD-WAN, and sends an initial prepended set of SD-WAN header values before the first packet for the flow to the next hop (e.g., another node, or a destination outside the SD-WAN) along this identified path, rather than encapsulating each packet of the flow with encapsulating tunnel headers that store SD-WAN next hop data for the flow. The prepended set of SD-WAN header values is then used to not only forward the first packet through the SD-WAN, but also to create records at each subsequent hop, which are then used to forward subsequent packets of the flow through the SD-WAN. Instead of identifying the entire packet flow, the first hop in the SD-WAN does not identify the entire path for the packet flow in some embodiments, but just identifies the next hop, as each subsequent hop in the SD-WAN has the task of identifying the next hop through the SD-WAN for the packet flow. Also, in some embodiments, each hop also creates records for the reverse flow in order to automatically forward reply packets along a reverse route. In some embodiments, the records comprise a TCP splicing record between two TCP connections of the node.

In some embodiments, the SD-WAN ingress node (referred to below as the “first hop”) generates the initial prepended set of one or more header values as part of a TCP split optimization operation that its TCP splitter (e.g., a TCP splitting machine, module, or server) performs. Under this approach, the packet flow is a TCP flow sent from a source machine outside of the SD-WAN (e.g., from a source computing device, or a source gateway, outside of the SD-WAN). The TCP splitter in some embodiments terminates the TCP connection and starts a new TCP connection to the next hop. That is, as the TCP splitter at each hop has a TCP connection to a previous hop and sets up a new TCP connection to the next hop, a TCP splitter at each hop can also be thought of as a TCP connector.

From the header of the received flow, the TCP splitter identifies (i.e., reads) the destination address of the first TCP packet. In some embodiments, the TCP splitter then identifies the path for the flow through the SD-WAN to a destination machine outside of the SD-WAN (e.g., to a destination computing device, or a destination gateway, outside of the SD-WAN). The TCP splitter then generates a set of SD-WAN header (SDH) values for the flow, each SDH value specifying the network address for a next hop address along the path. In some embodiments, the SDH values are part of a single SDH header, in other embodiments, the SDH values are in multiple headers (e.g., one header per SDH value, etc.). The TCP splitter then sends the generated set of SDH values to the next hop and then sends the first packet and subsequent packets of the TCP flow to the next hop. The set of SDH values are sent ahead of the first TCP packet in some embodiments, while in other embodiments they are prepended to the first packet but not the other packets of the flow. In either case, the tunnel-less SD-WAN system is referred to as a “prepended TCP” system or a “prepended TCP flow” system.

In some embodiments, the TCP splitter of the first hop identifies the path through the SD-WAN by using the header values of the first packet (e.g., its destination network addresses (such as layers 2-4 addresses) and in some cases the source network addresses (such as the layers 2-4 addresses)) to identify a path-traversal rule that specifies one or more possible paths for the TCP splitter to select for the flow through the SD-WAN. As mentioned above, the set of SDH values in some embodiments includes the network address for each subsequent hop along the SD-WAN to reach the flow's destination outside of the SD-WAN. In other embodiments, the first hop TCP splitter only includes in its generated set of SDH values the network address for the next hop, as each subsequent SD-WAN hop in these embodiments identifies the next hop after receiving the prepended packet from a previous hop.

In some of the embodiments where the first hop's prepended header includes the network addresses for each hop along the SD-WAN, each subsequent hop removes its network address from the prepended header, identifies the network address for the next hop along the SD-WAN, creates a record that stores the next-hop's network address for this flow, and forwards the prepended header (e.g., the first packet with the prepended header or the prepended packet flow) along to the next hop when the next hop is another hop along the SD-WAN.

The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description, the Drawings and the Claims is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawing.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.

FIG. 1 conceptually illustrates a process of some embodiments for sending a flow of TCP packets through a tunnel-less SD-WAN.

FIG. 2 illustrates a tunnel-less SD-WAN system.

FIG. 3A illustrates a prior art system for sending packets in tunnels.

FIG. 3B illustrates a path of nodes through a network using a tunnel-less SD-WAN system and data sent through the nodes.

FIG. 4A illustrates data structures for SDH values and TCP packets of some embodiments in which each hop identifies the next hop.

FIG. 4B illustrates a data structure for prepended configuring packets in an alternate embodiment.

FIG. 5 illustrates an example of a managed forwarding node 500 and a controller cluster 560 of some embodiments.

FIG. 6 conceptually illustrates an electronic system with which some embodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.

In a novel tunnel-less SD-WAN, when an ingress node of the SD-WAN (also referred to below as the “first hop”) receives a new packet flow, it identifies the path of the flow through the SD-WAN, and sends an initial prepended set of SD-WAN header values before the first packet for the flow to the next hop along this identified path, rather than encapsulating each packet of the flow with encapsulating tunnel headers that store SD-WAN next hop data for the flow. The prepended set of SD-WAN header values is then used to not only forward the first packet through the SD-WAN, but also to create records at each subsequent hop, which are then used to forward subsequent packets of the flow through the SD-WAN.

Instead of identifying the entire packet flow, the MFN of the first hop in the SD-WAN does not identify the entire path for the packet flow in some embodiments, but just identifies the next hop, as each subsequent hop in the SD-WAN has the task of identifying the next hop through the SD-WAN for the packet flow. Also, in some embodiments, each hop also creates records for the reverse flow in order to automatically forward reply packets along a reverse route. In some embodiments, the records comprise a TCP splicing record between two TCP connections of the node. In such embodiments, one set of TCP splicing records (per node) may allow both forward and reverse routing. SD-WANs are sometimes referred to herein as “virtual networks.”

Several embodiments will now be described by reference to FIGS. 1-5. In these embodiments, the first hop in the SD-WAN generates the initial prepended set of one or more header values as part of a TCP split optimization operation that its TCP splitter performs. Under this approach, the packet flow is a TCP flow sent from a source machine outside of the SD-WAN (e.g., from a source computing device, or a source gateway, outside of the SD-WAN). The TCP splitter in some embodiments terminates the TCP connection and starts a new TCP connection to the next hop.

From the header of the received flow, the TCP splitter identifies (i.e., reads) the destination address of the first TCP packet. In some embodiments, the TCP splitter then identifies the path for the flow through the SD-WAN to a destination machine outside of the SD-WAN (e.g., to a destination computing device, or a destination gateway, outside of the SD-WAN). The TCP splitter then generates a set of SD-WAN header (SDH) values for the flow, each SDH value specifying the network address for a next hop address along the path. In some embodiments, the SDH values are part of a single SDH header, in other embodiments, the SDH values are in multiple headers (e.g., one header per SDH value, etc.). The TCP splitter then sends the generated set of SDH values to the next hop and then sends the first packet and subsequent packets of the TCP flow to the next hop. The set of SDH values are sent ahead of the first TCP packet in some embodiments, while in other embodiments they are prepended to the first packet but not the other packets of the flow. In either case, the tunnel-less SD-WAN system is referred to as a “prepended TCP” system or a “prepended TCP flow” system.

In some embodiments, the TCP splitter of the first hop identifies the path through the SD-WAN by using the header values of the first packet (e.g., its destination network addresses (such as layers 2-4 addresses) and in some cases the source network addresses (such as the layers 2-4 addresses)) to identify a path-traversal rule that specifies one or more possible paths for the TCP splitter to select for the flow through the SD-WAN. As mentioned above, the set of SDH values in some embodiments includes the network address for each subsequent hop along the SD-WAN to reach the flow's destination outside of the SD-WAN. In other embodiments, the first hop TCP splitter only includes, in its generated set of SDH values, the network address for the next hop, as each subsequent SD-WAN hop in these embodiments identifies the next hop after receiving the prepended packet from a previous hop.

In some of the embodiments where the first hop's prepended header includes the network addresses for each hop along the SD-WAN, each subsequent hop removes its network address from the prepended header, identifies the network address for the next hop along the SD-WAN, creates a record that stores the next-hop's network address for this flow, and forwards the prepended header (e.g., the first packet with the prepended header or the prepended packet flow) along to the next hop when the next hop is another hop along the SD-WAN.

FIG. 1 conceptually illustrates a process 100 of some embodiments for sending a flow of TCP packets through a tunnel-less SD-WAN. FIG. 1 will be described with references to FIGS. 2 and 3B. FIG. 2 illustrates a virtual network 200. FIG. 2 includes multiple tenant locations at different locations 202a-202f, a tenant location 205 that is a source of a TCP packet flow, a tenant location 225 that is a destination of the TCP packet flow, managed forwarding nodes 204a-204j, network connections 230, 235, 240, and 245, and controllers 250.

Node 204a is a first hop in a tunnel-less SD-WAN route, from tenant location 205 to tenant location 225, through the network 200. Nodes 204b and 204c are subsequent hops in the route. Tenant locations 202a-202f and SD-WAN nodes 204d-204j are included to illustrate that an SD-WAN system generally has multiple network locations and multiple nodes that are not involved in any given TCP flow. The connections within network 200 (e.g., connections 235, 240) represent communicative connections between the nodes that may be selected by the next-hop forwarding rules to define paths through the SD-WAN network. These connections may include their own security protocols, such as IPsec or other such protocols or may use some other data security measure.

The controllers 250 provide forwarding rules and path-selection rules (e.g., next-hop forwarding rules, and in some embodiments other forwarding rules used to determine routes through the network 200) to the managed forwarding nodes 204a-204j. A path selection rule, in some embodiments, has (1) match criteria defined in terms of header values, and (2) one or more paths to destination. In some embodiments, each path has a path identifier, which is looked up in a table to identify all hops along path. Alternatively, a path can be defined directly in the path selection rule. The same node may assign more than one path when it is distributing loads for different flows (e.g., multiple flows with different source addresses and/or different destination addresses).

The active elements of FIG. 2, tenant locations 205 and 225, managed forwarding nodes 204a-204c, and network connections 230, 235, 240, and 245 are further described with respect to the operations of FIG. 1.

FIG. 3A illustrates a prior art system for sending packets in tunnels, which will be described briefly to contrast such a system with the present invention. FIG. 3A includes tenant location 205, connections 230, and 245, managed forwarding node 300 with encapsulation processor 302, managed forwarding nodes 305 and 310, tunnel 315, packets 320A and 320B, inner encapsulation header 322, and outer encapsulation headers 323 and 324. In the prior art shown, the tenant location 205 sends a data flow comprising multiple packets (here, packets 320A and 320B) through a network of managed forwarding nodes 300, 305, and 310 to tenant location 225.

The packets 320A and 320B are initially sent through connection 230 using IPsec for security. The encapsulation processor 302 of managed forwarding node applies an overlay tunnel (represented by tunnel 315) to the packets 320A and 320B. The overlay tunnels in some prior art systems include encryption of the packets being sent. The encapsulation processor 302 also prepends a pair of headers to every packet of the data flow. These two tunnel headers are (1) an inner header 322 that identifies (e.g., by IP address) the ingress MFN 300 and egress MFN 310 for entering and exiting the virtual network, and (2) an outer header 323 that identifies the next hop MFN 305. The outer header 323 includes a source IP address corresponding to MFN 300 and a destination IP address corresponding to the next hop, MFN 305. The inner tunnel header 322, in some embodiments, also includes a tenant identifier (TID) in order to allow multiple different tenants of the virtual network provider to use a common set of MFNs of the virtual network provider.

When, as in FIG. 3A, the path to the egress MFN 310 includes one or more intermediate MFNs (here, MFN 305), the intermediate MFN(s) replace the outer header with an outer header addressed to the next hop. Here, outer header 323 is replaced with outer header 324. The source IP address in the new outer header 324 is the IP address of MFN 305. The intermediate MFN 305 uses the destination IP address in the inner header 322 to perform a route lookup in its routing table to identify the destination IP address of the next hop MFN (here MFN 310) that is on the path to the destination IP address of the inner header. The replacement outer header 324 includes a destination IP address of next hop MFN 310 (as identified through the route table lookup). The managed forwarding node 310 then terminates the tunnel by removing the inner header 322 and outer header 324 from each packet and decrypting the packets before sending them through the connection 245 using IPsec for security.

Some advantages of the present tunnel-less SD-WAN invention include that the present invention does not require replacing an outer encapsulation header in every single packet of a data flow (which could be millions of packets) at every intermediate node, nor does the present invention require a route lookup from a routing table at each intermediate node for every packet of every flow. FIG. 3B illustrates a path of nodes through a virtual network using a tunnel-less SD-WAN system and data sent through the nodes. In addition to the active elements of FIG. 2, FIG. 3B also includes TCP splitter 330, a first packet 340 of a TCP flow, a second packet 342 of the TCP flow routing data 345, SDH headers/routing data 350 and 355, and new headers 360 and 365.

In FIG. 1, the process 100 transmits data through a managed forwarding node with a TCP splitter. The process 100 receives (at 102) a TCP packet flow at the MFN 204a of FIG. 2. The MFN 204a is one of several in the virtual network 200. Each MFN 204a-204c in the virtual network 200 has a cloud forwarding element. In some embodiments, multiple or all of the nodes of the virtual network have TCP splitters. Further description of the managed forwarding nodes of some embodiments is provided with respect to FIG. 5, below. Still further description of virtual networks and managed forwarding nodes can be found in U.S. patent application Ser. No. 15/972,083, filed May 4, 2018, now published as U.S. Patent Publication 2019/0103990, which is incorporated herein by reference. In FIG. 3B, first TCP packet 340 goes from tenant location 205 to node 204a, which is an MFN with a TCP splitter 330. In some embodiments, the TCP splitter is implemented as an operation of an optimization engine of the MFN 204a as described with respect to FIG. 5, below. In FIG. 3B, the final destination address of the TCP flow is a machine or device at the tenant location 225.

After receiving at least the first packet 340, the process 100 of FIG. 1 then identifies (at 104) a route comprising a series of hops through intermediate MFNs to send the TCP flow to the destination address. The process 100 identifies the route through the MFNs based on the initial MFN and the destination of the TCP flow, in some embodiments.

The process 100 of FIG. 1 then establishes (at 106) a new TCP connection to the MFN of the second hop, stores a connection tracking record associating the TCP connection on which the first packet was received with the new TCP connection, and sends the SDH values from the first hop (i.e., the MFN with the TCP splitter) to the MFN identified as the second hop. A TCP connection between two machines or devices includes an IP address and port address for each machine/device. The combination of an IP address and port address is sometimes called a “socket”, so a TCP connection has a socket at the source machine and another socket at the destination machine. TCP connection data for each TCP packet is stored in the header of the TCP packet. The set of data identifying the connection used by the packet is referred to as a tuple. Some embodiments identify connections using a 4-tuple (source IP address, source port, destination IP address, and destination port), other embodiments identify connections using a 5-tuple (the same values as the 4-tuple plus a value identifying a protocol of the packet). Storing the connection tracking record (of operation 106 of FIG. 1) associates the TCP connection from the branch 205 (of FIG. 3b) to MFN 204a with the new connection from MFN 204a to MFN 204b by storing (e.g., in a connection tracking record storage of the MFN 204a) a 5-tuple or in some embodiments a 4-tuple, identifying the incoming connection and a 5-tuple (or 4-tuple) identifying the new connection in a single connection tracking record. One of ordinary skill in the art will understand that in some embodiments, some information of the tracking record may be stored implicitly. For example, some embodiments omit the protocol value from the connection tracking record and/or omit the IP address of the MFN itself (e.g., in cases where the MFN has only one IP address, every incoming packet will have that IP address as its destination and every outgoing packet will have that IP address as its source, though different connections could use different ports of the MFN).

After (or in some embodiments, before) storing the connection tracking record, the MFN 204a sends SD-WAN headers to MFN 204b. Unlike the encapsulation headers of the prior art overlay tunnel, the SDH values are not added to every packet in the TCP flow, instead the SDH values are sent only once for the TCP flow. In some embodiments, the SDH values are sent ahead of the first packet of the TCP flow. In other embodiments, the SDH values are sent prepended to only the first packet of the TCP flow (e.g., prepended to the payload of the first packet or prepended as additional headers of the first packet 340 of FIG. 3). In either case, the tunnel-less SD-WAN system may be referred to as a “prepended TCP” system or “prepended TCP flow” system because the SDH values are prepended to the flow rather than to the individual packets. As the SDH values are only sent once, the second packet 342 and any subsequent packets of the same flow (not shown) are sent without prepending headers to those packets.

In FIG. 3B, new header 360 and SDH headers 350 and 355 are shown preceding (e.g., prepended to, or sent ahead of, the first packet) the packet 340 out of node 204a. The new header 360 identifies the TCP connection between MFNs 204a and 204b. Specifically, it is a header with a 5-tuple that includes (as the source address) an IP address and port address of MFN 204a and (as the destination address) an IP and port address of MFN 204b and a protocol of the packet. SDH 350 identifies node 204c as the next hop after node 204b, SDH 355 identifies the original destination IP address in tenant location 225 as the next destination after node 204c. In the illustrated embodiment, the SDH values are sent out in the same order as the nodes they identify. However, they may be sent in other orders in other embodiments.

The routing data 345, stored in the node 204a, identifies node 204b as the next hop after node 204a. In some embodiments, the routing data 345 for the TCP connection to the next hop is stored as part of the connection tracking record pairing (e.g., splicing) (a) the incoming TCP connection (of the node 204a) through which the packet 340 was received from a machine or device at tenant location 205 with (b) the TCP connection (of node 204a) to node 204b. In some embodiments, each flow uses a separate TCP connection between each pair of selected MFNs in the planned route. In some embodiments, there is also a separate TCP connection between the branch office 205 and the first hop MFN 204a and/or another separate TCP connection between the final hop MFN 204c and the branch office 225.

Each flow in some embodiments (i.e., each set of packets with the same original source and destination addresses) receives its own set of TCP connections between MFNs. A second flow (either from the same source address to a different destination address, from a different source address to the same destination address, or from a different source and different destination addresses as the first flow) in some embodiments can pass through one, some, or all of the same MFNs as the first flow, but every TCP connection that the second flow uses will be different from any connection that the first flow uses. One of ordinary skill in the art will understand that in some embodiments, different connections may have some values in common, for example, two connections between the same pair of MFNs could use the same IP and port address at the first MFN and still be separate connections so long as each connection's IP and/or port address at the second MFN are different. However, in some embodiments, the SD-WAN may reserve a particular IP address and port address for a particular flow rather than allowing multiple connections of multiple flows to use that particular IP address and port address.

More specifically, splicing two TCP connections of a node together configures the node so that, for any packet coming in with a header identifying a 5-tuple of one TCP connection (which will be called “the first connection” here, while the other TCP connection of the splice will be called “the second connection” for clarity) the header specifying the first connection will be replaced with a header specifying the second connection. Such a replacement may be performed using a match-action rule in some embodiments. In such embodiments, incoming packets whose headers include 5-tuples that match the stored 5-tuple of a connection tracking record trigger an action to replace the header with a header that includes the 5-tuple of the other connection stored in the connection tracking record.

After the old header is replaced with a new header (e.g., header 360 being replaced with header 365 at MFN 204b), the packet is sent on toward the subsequent MFN (e.g., MFN 204c). In some embodiments, TCP splicing also configures the node to receive and then forward reply packets. The reply packets will be received at the second connection and forwarded through the first connection to the “next hop” of the reply packets, which is the same MFN as the “prior hop” for packets in the original direction. In some embodiments that use a match-action rule, the match-action rules apply in both directions, but with match and action reversed for reply packets. That is, for packets of the original packet flow, the match attribute corresponds to the first connection and the action attribute corresponds to the second connection, while for packets of the reply packet flow, the match attribute corresponds to the second connection (with source and destination reversed from the action attribute of the original packet flow) and the action attribute corresponds to the first connection (with the source and destination reversed from the match attribute of the original packet flow).

Although the embodiments of the above description implement forwarding using connection tracking records and TCP socket splicing, in other embodiments, the routing data 345 is stored in some other format that identifies node 204b as the next hop for the TCP flow. Details about how the nodes splice the TCP connections and the contents of the SDH headers 350 and 355 and the first packet 340 for some embodiments are described with respect to FIG. 4, below.

The process 100, of FIG. 1, then sends (at 108) the 2nd and subsequent packets of the TCP flow from the MFN of the first hop to the MFN identified as the second hop. The 2nd and subsequent packets also have their headers replaced at the MFN of each hop. An example of this is shown in FIG. 3B, in which second packet 342 receives the same new header 360 at MFN 204a as the first packet 340, although not the SMH headers 350 and 355.

Before receiving the second packet 342, the MFN of the second hop 204b receives and processes the first packet 340 and its SDH headers 350 and 355 previously sent from the MFN 204a of the first hop. As shown in FIG. 1, the process 100 receives (at 110) the SDH values at the MFN of the next hop. The process 100 then establishes (at 112) a new TCP connection to the MFN identified as the MFN of the next hop by the SDH values and stores a connection tracking record that associates the connection of the incoming packets with the new connection. In some embodiments, the SDH values identify an IP address of the MFN of the next hop. In other embodiments, the SDH values provide a node identifier value that the MFN (e.g., the TCP connector of the MFN) uses to determine an IP address of the next hop MFN. In FIG. 3B, node 204b stores routing data (e.g., a 4-tuple or 5-tuple for the connection to the MFN of the next hop) corresponding to SDH 350, which identifies node 204c as the next hop for the TCP flow. In some embodiments, this routing data is stored as part of the connection tracking record in a connection tracking record storage of the MFN. In some embodiments, the connection tracking record also includes data identifying the incoming connection from which the packet 340 and its SDH headers 350 and 355 were received. To clarify that the routing data 350 stored at node 204b includes the connection identified in the SDH 350, they are both labeled with the same item number. However, one of ordinary skill in the art will understand that the format in which the routing data 350 is stored may be different in some embodiments than the format of the SDH 350. In some embodiments, as mentioned, the routing data 350 for the next hop is stored in a connection tracking record. In other embodiments, the routing data 350 is stored in some other format (e.g., a set of rules in some format) that identifies node 204b as the next hop for the TCP flow.

In the illustrated embodiment of FIG. 1, the MFN of the first hop identifies the specific MFNs of the route, but does not specify what port addresses each MFN should use to connect to the subsequent MFNs. Furthermore, in some such embodiments, where an MFN may have more than one IP address, the MFN of the first hop may specify the MFNs of the route without determining what IP address each MFN should use to connect to the MFN of the next hop. In other such embodiments, the first hop MFN may specify IP addresses for each subsequent hop, but still leave the port address determination to the subsequent MFNs. However, in other embodiments, rather than the initial MFN planning the entire route and sending out headers for each MFN along with a flow identifier (e.g., the original source and destination addresses of the packet flow), the MFN of the initial hop sends out just the flow identifier and each MFN identifies the next MFN on the route (or, for the last MFN of the route, determines that the MFN should connect to the final destination).

The process 100 of FIG. 1 then sends (at 114) the SDH values from the present MFN to the MFN at the next hop of the SD-WAN path, after removing the SDH values that identify the present node. In FIG. 3B, node 204b sends the packet 340 and SDH 355 to node 204c after removing SDH 350 and replacing header 360 with new header 365. In some embodiments, rather than reading and removing a leading SDH and sending the remaining SDHs on, each MFN sends all the SDHs and the SDHs include a pointer value that identifies the SDH values for the MFN receiving the SDHs to use. The receiving SDH then uses the SDH values identified by the pointer and updates the pointer value to point at the SDH values for the subsequent MFN before sending the entire set of SDHs on.

The process 100 of FIG. 1 then receives (at 116) the subsequent packets of the TCP flow and sends (at 118) the TCP flow to the next hop. In FIG. 3, MFN 204b receives second packet 342 and sends it to MFN 204c after replacing header 360 with header 365.

The process 100 repeats operations 110-118 at each node of the path until the SDH values and TCP packets reach the last node of the SD-WAN path before the final destination of the TCP flow. In FIG. 3B, the last node of the SD-WAN path is node 204c, which stores (at 112) routing data 355 corresponding to SDH 355 in the same manner as node 204b stores routing data 350. Since the “next hop” of the last node 204c is the destination IP at tenant location 225, there are no more SD-WAN nodes in the path. Therefore, node 204c skips operation 114 (of FIG. 1) and does not send out an SDH, but does send TCP packets 340, 342, and others in the flow (not shown) to the destination tenant location 225. The destination IP address receives (at 116) the TCP packets.

In some embodiments, the MFN 204c of the last hop restores the original header of the packets so that any firewalls and/or other analysis applications will identify the flow as originating from tenant location 205. In some embodiments, the MFN 204c sends the TCP packets of the flow to the edge gateway of the destination tenant location 225 through an IPsec connection. In some embodiments, the edge gateway creates a connection tracking record that maps the 5-tuple (or 4-tuple) of the received flow to the IPsec connection with the MFN 204c that forwarded the flow to the edge gateway. The edge gateway then uses the connection tracking record, when sending a reverse flow from the destination machine of the original flow to the source machine of the original flow, in order to forward the reverse flow to the correct MFN 204c, now acting as the ingress node, to the virtual network, for the reverse flow. The MFN 204c then uses its connection tracking record to select the connection with the MFN 204b to forward the reverse flow to the MFN 204b, which then uses its connection tracking record to forward the reverse flow to the MFN 204a. The MFN 204a then replaces the original header of the reverse flow (i.e., a 4-tuple or 5-tuple corresponding to the original header of the original flow, but with the source and destination addresses swapped) and forwards the reverse flow packets to the edge gateway of the tenant location 205 for forwarding to the original source machine. The edge gateway of the tenant location 205, in some embodiments, may also maintain a connection tracking record that associates the IPsec connection initially used to send the original packet flow to MFN 204a with the original packet flow header (5-tuple or 4-tuple) in order to consistently send packets of that flow to the same ingress MFN 204a, in some embodiments.

The connection tracking record of the last hop 204c may be different from the connection tracking records in the MFNs of the intermediate MFNs (e.g., MFN 204b) in some embodiments. In such embodiments, the final hop MFN 204c replaces the header 365 of each packet with the original header, rather than a header representing a connection between the MFN 204c and the edge gateway of tenant location 225. The connection tracking record of the egress MFN 204c may also include additional data identifying the IPsec connection to the edge gateway of tenant location 225 in some embodiments. Similarly, in some embodiments, the connection tracking record of the ingress MFN 204a may include additional data identifying the IPsec connection between the edge gateway of tenant location 205 and the ingress MFN 204a in order to send reply packets through the correct IPsec.

As mentioned above, in the embodiment of FIG. 1, the MFN of the first hop identifies the route through the virtual network 200 and sends SDHs that directly identify the subsequent hops to each hop of the identified route with a subsequent hop (and the final destination to the final hop of the route). However, in other embodiments, at each hop, the MFN identifies the subsequent hop, e.g., based on data in the configuration packet that does not directly identify the subsequent hop for each MFN.

FIG. 4A illustrates data structures for SDH values and TCP packets of some embodiments in which each hop identifies the next hop. FIG. 4A shows a first packet 400 of a TCP flow in the format it is initially sent from a device outside the SD-WAN, a prepended configuring packet 402 with edited payload 404, and a second packet 406 in the format of the second and subsequent packets as they pass through the SD-WAN.

The first packet 400 as sent from the source (e.g., from a device or machine at a tenant location through an edge gateway, sometimes called an “edge node” or “edge forwarding node,” of the tenant location) is formatted as an ordinary TCP packet sent from one device/machine to another. It includes an original header 405, with source and destination addresses corresponding to the original source and destination machines/devices. However, one of ordinary skill in the art will understand that when the packet is sent from the tenant location, the source and destination addresses may have been translated from internal addresses of machines/devices at the client network to external addresses by passing through an edge gateway of the tenant locations with a network address translation (NAT) system.

When the packet 400 is received at a first hop, the node at the first hop reformats the first packet 400 as a prepended configuring packet 402. As mentioned above with respect to FIGS. 3A and 3B, the node of the first hop creates a TCP connection to the node of the next hop. The node of the first hop then generates the prepended configuration packet 402 by replacing the original header 405 with a new header 415 identifying the first hop as the source and the next hop as the destination. The new header 415 allows packets to be sent between the first hop and the next hop. The node of the first hop then appends the original header 405 (or in some embodiments a subset of the values of the original header 405 or another flow identifier that identifies the flow) as part of the data payload 404 for the configuring packet. In some embodiments, the header values are not prepended to the payload of the packet, but are prepended elsewhere, for example, as additional headers or metadata of an existing TCP header, etc. In some such embodiments, the original header 405 data comprises a fixed number of byte (e.g., 12, 16, 32, 40, 64, etc.).

In the embodiments illustrated in FIG. 4A, at each subsequent hop, the node of that hop reads the original header 405 from the data payload 404. Based on the original header 405 data, the subsequent hop identifies a next subsequent hop through which to route a TCP flow between the original source and destination. The node at the subsequent hop sets up a TCP connection between that node and the node of the next subsequent hop. The node replaces the new header 415 with another new header 415 with the subsequent hop as the source and the next subsequent hop as the destination. The node then sends the packet 402 to the next subsequent hop. This continues until the packet 402 reaches the last node in its route through the SD-WAN. The last node removes the original header 405 data from the payload 404, recreating the payload 410. In some embodiments, the last node sets the original destination address as the destination address of the packet. In some embodiments, the last node sets the original source address as the source address of the packet, completing the recreation of the first packet 400 as sent from the source (or in some embodiments, as sent from the edge gateway of the original tenant location).

Recreating the original packet 400 entirely has advantages, for example, by using the original source address, firewalls of the destination tenant location can identify the packets as originating from an allowed address, etc. However, in alternate embodiments, there may be some differences between the original packet 400 when it is sent from the first tenant location and when it is sent from the node at the least hop in the SD-WAN path. For example, in some embodiments the node may edit the packet to use the last hop as the source address.

Once the prepended configuring packet is sent, the second packet 406 (and subsequent packets) receive new headers 415 at each hop that are the same as the new headers 415 received by the prepended configuring packet 402. However, as the TCP connections between the nodes at the hops along the route had already been set up in response to the prepended configuring packet 402, the second packet 406 (and subsequent packets) are sent along at each hop with the same payload 420 as they were originally sent with from the original source.

FIG. 4B illustrates a data structure for prepended configuring packets in an alternate embodiment in which the entire path through the SD-WAN is determined by the node of the first hop. FIG. 4B shows a prepended configuring packet 430 with an edited payload 434. In this embodiment, in addition to prepending the original header 405 data (or a subset thereof) the first hop prepends a set of one or more hop identifiers (IDs) 440. In this embodiment, the first hop prepends the hop IDs 440 to the payload 410 along with the original header 405 data (or other flow identifier). Then at each subsequent hop, the node of that hop uses the set of hop IDs 440 to generate a TCP connection to the next subsequent hop, before removing the hop ID for itself from the set of hop IDs 440 before sending the packet on to the next subsequent hop. As described with respect to FIG. 4A, each node provides new headers 415 to replace the previous header of the packet 430 with source and destination addresses corresponding to the hop that the packet is being sent on. Similarly, the second packet 406 (and subsequent packets) do not need path configuring data in this embodiment as the nodes have set up the TCP connections based on the prepended configuring packet.

Various embodiments may provide the hop IDs 440 (of FIG. 4B) in various different formats. Some embodiments provide each identifier as an IP address and port address of the next subsequent hop. Other embodiments provide an identifier that specifies the next hop as being a particular node in the network, with the current node determining IP and port addresses based on a lookup table for nodes in the network. As previously mentioned, in some embodiments, rather than reformatting an existing first packet of a TCP flow, the node at the first hop generates a separate configuring packet that identifies the flow and includes identifiers of the subsequent hops. The node then sends this configuring packet out before sending the first packet out without prepending anything to its payload in a similar manner to the second packet 406 of FIG. 4A.

In multi-tenant networks, of some embodiments, routing depends on a tenant ID. In such networks, metadata identifying the tenant (and in some cases additional data) are included in the configuring packet 430, either as metadata of the new header 415, as part of the data prepended in the payload 434 for the configuring packet 430, or elsewhere in the configuring packet 430. For example, in some embodiments, each header has a TLV (type, length, value) structure. This allows adding any number of flexible fields. For example, in some embodiments, the header data includes fields with type “tenant ID” with a specific length and a value that identifies the particular tenant from which the data flow originates, in addition to fields that identify next hop or other values described above. In some embodiments, the TCP connections between each two consecutive hops result in the metadata (identifying a particular tenant) being implicitly part of the TCP stream defined by the packets' source and destination address tuples.

The virtual network 200 described with respect to FIGS. 2 and 3B includes managed forwarding node 204a with a TCP splitter and other managed forwarding nodes. In some embodiments, multiple nodes may implement TCP splitters. In some such embodiments, the nodes include elements such as an optimization engine that performs the TCP splitting. Furthermore, in some embodiments all nodes include an optimization engine or some other element that performs TCP splitting. Additionally, in some embodiments, machines or devices of the tenant locations may include elements that perform TCP splitting.

FIG. 5 illustrates an example of a managed forwarding node 500 and a controller cluster 560 of some embodiments. In some embodiments, each managed forwarding node 500 is a machine (e.g., a VM or container) that executes on a host computer in a public cloud datacenter. In other embodiments, each managed forwarding node 500 is implemented by multiple machines (e.g., multiple VMs or containers) that execute on the same host computer in one public cloud datacenter. In still other embodiments, two or more components of one MFN can be implemented by two or more machines executing on two or more host computers in one or more public cloud datacenters.

In some embodiments, a logically centralized controller cluster 560 (e.g., a set of one or more controller servers) operates inside or outside of one or more public clouds, and configure the public-cloud components of the managed forwarding nodes 500 to implement the virtual network 200 (and in some embodiments, other virtual networks for other tenants) over the public clouds. In some embodiments, the controllers in this cluster are at various different locations (e.g., are in different public cloud datacenters) in order to improve redundancy and high availability. The controller cluster in some embodiments scales up or down the number of public cloud components that are used to establish the virtual network 200, or the compute or network resources allocated to these components.

As shown, the managed forwarding node 500 includes one or more optimization engines 520, edge gateways including branch gateway 525 and remote device gateway 532, and a cloud forwarding element 535 (e.g., a cloud router). In some embodiments, each of these components 520-535 can be implemented as a cluster of two or more components. The optimization engines 520 receive data from and send data to the Internet 502, the cloud forwarding element 535, branch gateway 525 and remote device gateway 532.

The controller cluster 560 in some embodiments can dynamically scale up or down each component cluster (1) to add or remove machines (e.g., VMs or containers) to implement each component's functionality and/or (2) to add or remove compute and/or network resources to the previously deployed machines that implement that cluster's components. As such, each deployed MFN 500 in a public cloud datacenter can be viewed as a cluster of MFNs, or it can be viewed as a node that includes multiple different component clusters that perform different operations of the MFN.

Also, in some embodiments, the controller cluster deploys different sets of MFNs in the public cloud datacenters for different tenants for which the controller cluster defines virtual networks over the public cloud datacenters. In this approach, the virtual networks of any two tenants do not share any MFN. However, in the embodiments described below, each MFN can be used to implement different virtual networks for different tenants. One of ordinary skill will realize that in other embodiments the controller cluster 560 can implement the virtual network of each tenant of a first set of tenants with its own dedicated set of deployed MFNs, while implementing the virtual network of each tenant of a second set of tenants with a shared set of deployed MFNs.

In some embodiments, the branch gateway 525 and remote device gateway 532 establish secure VPN connections respectively with one or more branch offices, such as branch office 205, and remote devices (e.g., mobile devices 202c) that connect to the MFN 500, as shown in FIG. 5. The connection from the branch gateway 525 to the branch office 205, in some embodiments, goes through an edge gateway 570 of the branch office 205. The edge gateway 570 passes the data to and from host machines 575 of the branch office 205 and, through the host machines 575, to virtual machines 580 of the host machines 575.

One example of such VPN connections is IPsec connections as mentioned with respect to FIGS. 3A and 3B. However, one of ordinary skill will realize that in other embodiments, such gateways 525 and/or 532 establish different types of VPN connections.

In the example illustrated in FIG. 5, the MFN 500 is shown to include one or more L4-L7 optimization engines 520. One of ordinary skill will realize that in other embodiments, the MFN 500 includes other middlebox engines for performing other middlebox operations.

The optimization engine 520 executes novel processes that optimize the forwarding of the entity's data messages to their destinations for best end-to-end performance and reliability. Some of these processes implement proprietary high-performance networking protocols, free from the current network protocol ossification. For example, in some embodiments, the optimization engine 520 optimizes end-to-end TCP rates through intermediate TCP splitting and/or termination. In some embodiments, an optimization engine 520, some other component of the node 500, and/or some component of the VNP central control determines an identified routing path for each pair of data message endpoints. This may be a routing path that is deemed optimal based on a set of optimization criteria, e.g., it is the fastest routing path, the shortest routing path, or the path that least uses the Internet.

The cloud forwarding element 535 is the MFN engine that is responsible for forwarding a data message flow to the next hop MFN's cloud forwarding element (CFE) when the data message flow has to traverse to another public cloud to reach its destination, or to an egress router in the same public cloud when the data message flow can reach its destination through the same public cloud. In some embodiments, the CFE 535 of the MFN 500 is a software router.

To forward the data messages, the CFE encapsulates the messages with tunnel headers. Different embodiments use different approaches to encapsulate the data messages with tunnel headers. Some embodiments described below use one tunnel header to identify network ingress/egress addresses for entering and exiting the virtual network, and use another tunnel header to identify next hop MFNs when a data message has to traverse one or more intermediate MFN to reach the egress MFN.

As mentioned with respect to FIG. 3A, in some prior art virtual networks, the managed forwarding nodes send data packets encapsulated with tunnel headers. In some such prior art virtual networks, the CFE sends each packet of the data message with two tunnel headers (1) an inner header that identifies an ingress CFE and egress CFE for entering and exiting the virtual network, and (2) an outer header that identifies the next hop CFE. The inner tunnel header in some prior art systems also includes a tenant identifier (TID) in order to allow multiple different tenants of the virtual network provider to use a common set of MFN CFEs of the virtual network provider. However, in some embodiments of the present invention, rather than sending tunnel headers with each packet of a data message, a TCP splitter of an initial MFN provides a single set of SD-WAN header values for an entire flow, as described with respect to FIGS. 1-4.

Different embodiments define neighboring nodes differently. For a particular MFN in one public cloud datacenter of a particular public cloud provider, a neighboring node in some embodiments includes (1) any other MFN that operates in any public cloud datacenter of the particular public cloud provider, and (2) any other MFN that operates in another public cloud provider's datacenter that is within the same “region” as the particular MFN.

Although the above figures were described with respect to TCP packets, TCP splitters, TCP flows, TCP connections, etc. one of ordinary skill in the art will understand that in other embodiments, other packet protocols (e.g., UDP, ICMP, etc.) may be used. In such embodiments, machines or devices that provide the equivalent operations as a TCP splitter for the respective protocols would be used in place of a TCP splitter and any processes and devices would be adapted to the appropriate protocol.

In the above described embodiments, the ingress MFN replaced the original header of each packet with a header for a TCP connection to the next hop, each intermediate MFN replaced the header of each packet with a header for a TCP connection to the next hop and the egress MFN replaced the header of each packet with the original header of the packet flow. However, in other embodiments, the original header of each packet is left intact at the ingress MFN, with headers representing the TCP connection to the next hop being prepended to each packet and the original header becoming part of the payload of the packet as it is sent through the SD-WAN. The prepended header is then replaced at each intermediate MFN and removed at the egress MFN, leaving the original header as the header of the packet, before the packet is sent to the edge gateway of the destination location.

FIG. 6 conceptually illustrates an electronic system 600 with which some embodiments of the invention are implemented. The electronic system 600 can be used to execute any of the control, virtualization, or operating system applications described above. The electronic system 600 may be a computer (e.g., a desktop computer, personal computer, tablet computer, server computer, mainframe, a blade computer etc.), phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 600 includes a bus 605, processing unit(s) 610, a system memory 625, a read-only memory 630, a permanent storage device 635, input devices 640, and output devices 645.

The bus 605 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 600. For instance, the bus 605 communicatively connects the processing unit(s) 610 with the read-only memory 630, the system memory 625, and the permanent storage device 635.

From these various memory units, the processing unit(s) 610 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments.

The read-only-memory (ROM) 630 stores static data and instructions that are needed by the processing unit(s) 610 and other modules of the electronic system. The permanent storage device 635, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 600 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 635.

Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 635, the system memory 625 is a read-and-write memory device. However, unlike storage device 635, the system memory is a volatile read-and-write memory, such a random access memory. The system memory 625 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 625, the permanent storage device 635, and/or the read-only memory 630. From these various memory units, the processing unit(s) 610 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.

The bus 605 also connects to the input and output devices 640 and 645. The input devices 640 enable the user to communicate information and select commands to the electronic system. The input devices 640 include alphanumeric keyboards and pointing devices (also called “cursor control devices”). The output devices 645 display images generated by the electronic system 600. The output devices 645 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as a touchscreen that function as both input and output devices.

Finally, as shown in FIG. 6, bus 605 also couples electronic system 600 to a network 665 through a network adapter (not shown). In this manner, the computer can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 600 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

This specification refers throughout to computational and network environments that include virtual machines (VMs). However, virtual machines are merely one example of data compute nodes (DCNs) or data compute end nodes, also referred to as addressable nodes. DCNs may include non-virtualized physical hosts, virtual machines, containers that run on top of a host operating system without the need for a hypervisor or separate operating system, and hypervisor kernel network interface modules.

VMs, in some embodiments, operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.). The tenant (i.e., the owner of the VM) can choose which applications to operate on top of the guest operating system. Some containers, on the other hand, are constructs that run on top of a host operating system without the need for a hypervisor or separate guest operating system. In some embodiments, the host operating system uses name spaces to isolate the containers from each other and therefore provides operating-system level segregation of the different groups of applications that operate within different containers. This segregation is akin to the VM segregation that is offered in hypervisor-virtualized environments that virtualize system hardware, and thus can be viewed as a form of virtualization that isolates different groups of applications that operate in different containers. Such containers are more lightweight than VMs.

Hypervisor kernel network interface modules, in some embodiments, are non-VM DCNs that include a network stack with a hypervisor kernel network interface and receive/transmit threads. One example of a hypervisor kernel network interface module is the vmknic module that is part of the ESXi™ hypervisor of VMware, Inc.

It should be understood that while the specification refers to VMs, the examples given could be any type of DCNs, including physical hosts, VMs, non-VM containers, and hypervisor kernel network interface modules. In fact, the example networks could include combinations of different types of DCNs in some embodiments.

While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, a number of the figures conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Claims

1-20. (canceled)

21. A method of forwarding packets through a software-defined wide area network (SD-WAN), the method comprising:

at an ingress forwarding node of the SD-WAN: terminating a TCP (Transport Connection Protocol) connection for a flow from a first site connected to the SD-WAN to a second site connected to the SD-WAN; identifying a set of forwarding nodes in the SD-WAN that the flow should take to reach the second site; starting a new TCP connection a next SD-WAN forwarding node in the identified set, and sending data regarding the identified set of forwarding nodes to the next SD-WAN forwarding node; sending the flow to the next forwarding node in the SD-WAN.

22. The method of claim 21, wherein sending data regarding the identified set of forwarding nodes comprises sending, to the next forwarding node, one or more identifiers for one or more forwarding nodes in the identified set that are after the next forwarding node.

23. The method of claim 22, wherein the one or more identifiers comprise one or more network addresses one or more forwarding nodes in the identified set that are after the next forwarding node.

24. The method of claim 21, wherein the terminating and starting are part of a TCP split operation performed by the ingress forwarding node.

25. The method of claim 21, wherein identifying the set of forwarding elements comprises identifying a path through the SD-WAN based on header values of a first packet of the flow.

26. The method of claim 25, wherein the sent data comprises one or more identifiers of one or more forwarding nodes along the path, the method further comprising:

at each forwarding node along the path after the ingress forwarding node, identifying a subsequent forwarding node from the sent data, removing an identity of the particular forwarding node from the sent data, and forwarding the remaining data to a subsequent forwarding node when there is a next subsequent forwarding node.

27. The method of claim 26, wherein when there is not a next subsequent forwarding node, forwarding the flow from the subsequent forwarding node to the second site connected to the SD-WAN.

28. The method of claim 26 further comprising:

at each SD-WAN forwarding node traversed by the flow from the first site to the second site: performing a TCP split operation to terminate an incoming TCP connection and to start a new outgoing TCP connection; storing a record for the flow that associates the two TCP connections; and using the record to forward the packets of the flow along the path.

29. The method of claim 28, wherein the flow is a first flow, the method further comprising using the record to forward reply flow sent from the second site to the first site in response to the first flow.

30. The method of claim 21, wherein:

the packet flow comprises a first packet and a plurality of subsequent packets, and

sending the data to a next forwarding node in the SD-WAN comprises sending the data before or with the first packet and not sending any additional SDH values before or with ant subsequent packets of the flow.

31. A non-transitory machine readable medium storing a program which when executed by at least one processing unit forwards packets through a software-defined wide area network (SD-WAN), the program for execution at an ingress forwarding node of the SD-WAN, the program comprising sets of instructions for:

terminating a TCP (Transport Connection Protocol) connection for a flow from a first site connected to the SD-WAN to a second site connected to the SD-WAN;

identifying a set of forwarding nodes in the SD-WAN that the flow should take to reach the second site;

starting a new TCP connection a next SD-WAN forwarding node in the identified set, and sending data regarding the identified set of forwarding nodes to the next SD-WAN forwarding node;

sending the flow to the next forwarding node in the SD-WAN.

32. The non-transitory machine readable medium of claim 31, wherein the set of instructions for sending data regarding the identified set of forwarding nodes comprises a set of instructions for sending, to the next forwarding node, one or more identifiers for one or more forwarding nodes in the identified set that are after the next forwarding node.

33. The non-transitory machine readable medium of claim 32, wherein the one or more identifiers comprise one or more network addresses one or more forwarding nodes in the identified set that are after the next forwarding node.

34. The non-transitory machine readable medium of claim 31, wherein the sets of instructions for terminating and starting are part of a TCP split operation performed by the ingress forwarding node.

35. The non-transitory machine readable medium of claim 31, wherein the set of instructions for identifying the set of forwarding elements comprises a set of instructions for identifying a path through the SD-WAN based on header values of a first packet of the flow.

36. The non-transitory machine readable medium of claim 35, wherein the sent data comprises one or more identifiers of one or more forwarding nodes along the path, the program further comprising a set of instructions for:

at each forwarding node along the path after the ingress forwarding node, identifying a subsequent forwarding node from the sent data, removing an identity of the particular forwarding node from the sent data, and forwarding the remaining data to a subsequent forwarding node when there is a next subsequent forwarding node.

37. The non-transitory machine readable medium of claim 36, wherein when there is not a next subsequent forwarding node, forwarding the flow from the subsequent forwarding node to the second site connected to the SD-WAN.

38. The non-transitory machine readable medium of claim 36, wherein the program further comprises sets of instructions for:

at each SD-WAN forwarding node traversed by the flow from the first site to the second site: performing a TCP split operation to terminate an incoming TCP connection and to start a new outgoing TCP connection; storing a record for the flow that associates the two TCP connections; and using the record to forward the packets of the flow along the path.

39. The non-transitory machine readable medium of claim 38, wherein the flow is a first flow, the program further comprises a set of instructions for using the record to forward reply flow sent from the second site to the first site in response to the first flow.

40. The non-transitory machine readable medium of claim 31, wherein:

the packet flow comprises a first packet and a plurality of subsequent packets, and

the set of instructions for sending the data to a next forwarding node in the SD-WAN comprises a set of instructions for sending the data before or with the first packet and not sending any additional SDH values before or with ant subsequent packets of the flow.