Layer 4 switching for persistent connections

- Akamai Technologies, Inc.

This disclosure provides for a Layer 4 switching approach wherein a set of L4 switches are organized into a cluster so as to act as a single (or “big”) Layer 4 switch. Connections between the L4 switches are carried out, e.g., using Layer 2 switches. To this end, an intra-cluster routing entity of the switch maintains mapping information (e.g., in a database, or set of data structures) about connections that have been established by the individual switches within the cluster. In this approach, each host (itself a switch) preferably acts like a group of ports of the larger (big) switch. This obviates having each member host from having to maintain connections to many possible destinations. Rather, the intra-cluster routing entity maintains the information about which hosts (and its ports) are connected to which destinations, and the connections are re-used as necessary, even if connections on one side of the “big” switch ceased being used.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
BACKGROUND Technical Field

This application relates generally to data packet switching.

Brief Description of the Related Art

Transport layer switches splice two OSI Layer 4 (L4) connections. Given two connection legs, Layer 4 switches typically terminate data flow of one leg first, and then forward in-sequence packets to the other leg. The isolation of packet loss in one leg from the other is an important factor in improving an overall end-to-end delivery performance because the recovery in one leg is usually quicker than otherwise in one longer connection. Multiplexed connections have multiple streams in such a connection. Layer 4 switches, without knowing the streams are multiplexed, do the same switching functions; they forward only in-sequence packets to the other leg. While it is believed that one single multiplexed connection generally shows improved performance over non-multiplexed multiple connections, one drawback of such multiplexed connections is reported higher sensitivity to packet loss. This is a form head of line blocking (HOL) at the connection level, where the data unit in problem blocks all other data units behind.

BRIEF SUMMARY

To address this and other problems associated with the prior art, this disclosure provides for a Layer 4 switching approach wherein a set of L4 switches are organized into a cluster so as to act as a single (or “big”) Layer 4 switch. Connections between the L4 switches are carried out, e.g., using Layer 2 switches. To this end, an intra-cluster routing entity of the switch maintains mapping information (e.g., in a database, or set of data structures) about connections that have been established by the individual switches within the cluster. In this approach, each host (itself a switch) preferably acts like a group of ports of the larger (big) switch. This obviates having each member host from having to maintain connections to many possible destinations. Rather, the intra-cluster routing entity maintains the information about which hosts (and its ports) are connected to which destinations, and the connections are re-used as necessary, even if connections on one side of the “big” switch ceased being used.

The foregoing has outlined some of the more pertinent features of the subject matter. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed subject matter in a different manner or by modifying the subject matter as will be described.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the subject matter and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of a known transport layer switch;

FIG. 2 depicts the transport layer switch in additional detail;

FIG. 3 illustrates a Layer 4 switch cluster for handling a large number of connections;

FIG. 4 depicts a situation wherein there are a large group of client, a large group of servers, and a cluster of switches in-between, showing a relationship of the groups in light of persistent connections;

FIG. 5 depicts a model of a switch cluster in a web access environment wherein different groups are defined;

FIG. 6 depicts the technique of this disclosure wherein a collection of switches acts as a large scale Layer 4 switch;

FIG. 7 depicts an endpoint for support of the Layer 4 switching of this disclosure; and

FIG. 8 depicts how L4 switches as implemented herein may be used in an operating environment wherein a cellular wireless link is coupled to an Internet-based distributed delivery network, such as a content delivery network (CDN).

DETAILED DESCRIPTION

FIG. 1 depicts a model of the common implementation of a transport layer switch operating environment. In this drawing, Host C 100 is the switch between Host A 102 and B 104. More specifically, a splicing function in Host C is the switch between the two connections, one connection 106 between Host A and C, and another connection 108 between Host C and B. The splicing functionality 105 acts with respect to TCP end point 110 and 112 at Host C to seamlessly connect (splice) TCP(a,c) and TCP(c,b). The splicing functionality 105 transfers packets from one end point to another in the same host. Each packet flow, sometimes referred to as a connection segment, is terminated before being transferred to some other flow (connection). Host C is not necessarily just a single computing machine; indeed, the basic switching notion shown in FIG. 1 may be generalized as a distributed environment wherein an application layer overlay (comprising a large number of separate machines) on top of the Internet handles a massive number of TCP pairs. Thus, and with reference to FIG. 2, host C in FIG. 1 is depicted as a large-scale transport layer switch 200, effectively as an N×N switch, where N is the number of end points currently served. Here, the task of the switch is to transfer packets from one end point to another in the same single TCP pair as efficiently as possible. For Layer 3 routers and Layer 2 switches, the N×N switch typically is implemented in custom hardware in a form of ASIC (application-specific integrated circuit). In the L4 transport layer switch, however the processing unit 202 acts like a virtual switch fabric using an application layer buffer 204 as the switching device(s). In L4 switching, the packet switching is performed only between two end points (PORT in the figure) belonging to a same pair of TCP segments.

Regarding the common question on the motivation of having a switch between two connections, research has found that an intermediate node as a relay between the two connection segments, as illustrated as Host C in FIG. 1, actually can help achieve higher end-to-end performance. Although it might appear counter-intuitive with an additional host in between, the results are based on the high recovery cost of packet loss inherent to the reliable data transfer algorithms of TCP. In particular, the recovery cost on either one of segments (Connection 106 and Connection 108) is found lower than that on the end-to-end connection.

Note that the splicing functionality 105 (FIG. 1) transfers packets from one end point to another in the same host. Each packet flow, a connection segment, is terminated before being transferred to the other flow (connection).

By way of additional background, distributed Internet applications can be better optimized by combining persistent connections and Layer 4 switching together in place. For example, in FIG. 1, the communication between Host A and B is optimized first by having the Layer 4 switch in between, and secondly by having persistent connections, one between Host A and C, and another between Host C and B. Repeated data transfer between Host A and B preferably fully utilizes the persistent connections and Layer 4 switching avoids the overhead of destruction and construction of connections, and by minimizing the overhead of delivery guarantee.

One good example of the transport layer switch can be found in content distribution networks (CDN) on a global scale. The fundamental approach for building the largest such networks is to use an overlay on top of the Internet. Technically, this means that the transport layer switch (e.g., Host C in FIG. 1) is required to handle a massive number of TCP pairs. This requirement creates an interesting engineering challenge as modeled in FIG. 2. The figure portraits Host C in FIG. 1 as a large scale transport layer switch, effectively an N×N switch, where N is the number of end points currently served. In this case, the task of the switch is to transfer packets from one end point to another in the same single TCP pair as efficiently as possible. For Layer 3 routers and Layer 2 switches, the N×N switch fabric is normally implemented in custom hardware in a form of ASIC (application specific integrated circuit). In the transport layer switch, however, and as noted above, the CPU acts like a virtual switch fabric using the application layer buffer as the switching devices.

In case where N is not large enough to accommodate the desired number of connections, and where N is the maximum number of connections a host can support, there are generally two architectural choices to handle the situation. One is to use a larger capacity host, which would quickly become expensive, as typically that approach could not be implemented in a commodity-based manner. The other is to use multiple hosts, which is much more practical. To that end, FIG. 3 illustrates the notion of a Layer 4 switch cluster 300 as a practical approach for handling a massive number of connections. For simplicity, the switching capacity of each host is assumed N. As can be seen, K machines of N capacity will collectively make K×N cluster switching capacity.

FIG. 4 depicts a switch cluster implementation where there are a large group of clients 402, a large group of servers 404, and a cluster of switches 400 in between. While FIG. 3 shows just the switch cluster, FIG. 4 illustrates the relationship of the groups in light of the connections, which preferably are persistent connections. The model of the switch cluster in FIG. 4 herein is sometimes referred to as Coexisting. This Coexisting model is not necessarily efficient in terms—of persistent connection availability. In this model, the connection in the right hand side (between the switch cluster and server group) is tied up with the individual switch, where the connection is terminated. This binding does not allow other connections in the left-hand side (between the client group and switch cluster) to utilize the right hand side connection unless both connections are terminated in the same individual switch. An example follows.

Assume a client C1 connects with a server S1 through a switch A. After a short while, the client C1 goes offline. The client status change destroys the connection between the client and the switch (C1 and A). The connection between the switch and the server (A and S1), however, still remains alive with the expectation that this connection will be used for other data transactions sooner or later. Sometime later, the client C1 comes back online to the server S1. The connection from the client to the cluster, however, happens to another switch B this time. To make matters worse, for the moment, assume that the switch B does not have a connection to the server S1. The consequence is that, even though there is an existing connection to the server S1 from the switch A of the same cluster, a new connection has to be established between the switch B and server S1. The problem here is that, in this example, the client C1 has no benefit to utilize existing persistent connections. This problem happens because of the binding of individual switch and connection. While the cluster switching capacity is linearly growing with more individual switches, the persistent connection availability is still tied up with one single individual switch.

FIG. 5 shows an organic model of switch cluster in a web access scenario where three different groups are defined; the client group 502 of size X, the server group 504 of size Z, and finally the switch group 500 of size Y. There are two types of connections; one between a member of X and a member of Y, C(Xi,Yj), and another between a member of Y and a member of Z, C(YjZk), where i, j, and k are a unique member of the respective group. The lifetime of C(Xi,Yj) is subject to the user agent's policy at the client side. The switch does not initiate the termination of C(Xi,Yj). The maximum lifetime of C(Xi,Yj) is from the time the client comes online to the time of the client's explicit action for termination, including going offline, turning off the browser, etc. The lifetime of C(Yj,Zk) is subject to the server policy at the server side. The switch keeps C(Yj,Zk) alive as long as the server policy allows. The maximum lifetime of C(Yj,Zk) is the maximum time the server allows. The effective minimum time Tmin, that the client has to spend to establish a connection to the intended server is therefore the time needed to establish a C(Xi,Yj) when the client comes online first. The goal of the organic model is to guarantee Tmin for all online activities with some server by each client Xi. The maximum number of connections that the client group could create as a whole to the switch cluster is defined as Cmax (X,Y) Likewise, the number of connections that the server group would need to serve the online activities created from Cmax (X,Y) is defined as Cmax (Y,Z). In this organic model, the number of connections the switch cluster maintains from itself to the server group is limited only to:
Cmax(X,Y)

Note, for comparison, that in the coexisting model shown in FIG. 4, this number is upper bounded by:
Cmax(Y,Z)×Y
to guarantee Tmin for all online activities by each and every client Xi. Most importantly, the organic model allows that the number of persistent connections supporting Tmin is proportional to the size of the switch cluster Y. In contrast, in the coexisting model (FIG. 4), the scope of such persistent connections is bound to one single switch capacity regardless of the cluster size.
Layer 4 Switch for Persistent Connections

FIG. 6 shows the concept of a big Layer 4 switch (“BigL4”) according to this disclosure. The switch 600 comprises a collection of switches 602a-n that act as one big Layer 4 switch. The operation of the switch 600 is straightforward. Assume a C(Xi,Yj) is created at switch A with some intended server. By consulting an intra-cluster routing entity 604, switch A learns that a C(Yj,Zk) is available at switch C. The routing entity can find the available connection by using the information of intended server by the client. Note that the switch A does not have to have a connection to the intended server by the client. The existing connection maintained by switch C is utilized by switch A. After some while, the C(Xi,Yj) is destroyed by the client. However, the connection C(Yj,Zk) still remains alive as long as the server policy allows. Another connection C(Xi,Yj) is created now at switch B. The switch B again repeats the same intra-cluster routing process. So the goal of Tmin is achieved again for the C(Xi,Yj).

The intra-cluster routing is a new functional entity used in the switch 600. It maintains mapping information between the switch and available connection from the switch to some servers. Other attributes of the connection can include current status, i.e., actively used or idle, values of TCP parameters, current buffer size, etc. The indexing of the information record can be done in many different ways. One straightforward approach is to use the IP address of the server. Another example is to use the URL (universal resource locator) of the HTTP request/response message for web access applications. The intra-cluster routing entity can be centralized at one physical device, switch or a separate (virtual) appliance, or it may be distributed to the switches in the cluster.

The allocation of connections between the switch and server groups may be implemented in one of several ways. If the system runs in a first-come and first-served basis (FIFO), it is likely that some switches are highly loaded while others are idle. Allocating the same number of connections to each switch would not solve this load balancing problem because different connections will handle different end user behaviors and web services. One generic solution is to equip the intra-cluster routing with the traffic load information dynamically so that the routing entity can choose a switch that not only has an idle connection to the server but also more CPU cycles to take more traffic.

The intra-cluster switching from one Layer 4 switch to another, say, from Switch A to Switch C in the switch 600 in FIG. 6, for example, may be implemented by a commercial Layer 2 switch. If one Layer 2 switch's capacity is not big enough for the cluster size, multiple Layer 2 switches can be used to comprise the Layer 6 switch of this disclosure. In a rare case for an enormous number of persistent connections, and where a large number of Layer 2 switches are required to support a very large Layer 6 switch, advanced Layer 2 networking technologies can be introduced.

The approach is advantageous as the capacity grows linearly for even massive persistent connections. The capacity typically is proportional to the number of individual Layer 2 switches in the cluster. The switch maintains a minimum number of persistent connections to a destination, which then maximizes the utilization of existing persistent connection. The performance gain is obtained by leveraging the notion that, with respect to a given destination, the switch preferably uses an existing persistent connection from some switch in the cluster to that destination. Practically, however, having only one connection to a destination may risk creating the head of line blocking problem (HOL) described above, a classical issue in networking. This problem can happen in a general situation, where multiple traffic flows are heading for one single destination. The net effect is that only one flow or packet at a time can reach the destination while the rest is waiting for a turn. In the switch of this disclosure, it is because any connection from the left side of FIG. 6 can try to use any connection from the right side. To mitigate the HOL problem, preferably some multiple connections to a given destination should exist in the cluster.

There can be various ways to implement the endpoint in support for the switch. FIG. 7 shows one approach without a modification to the current popular scheme of endpoint management. In this approach, the intra-cluster routing entity 704 is the only addition to the Coexisting model. The overhead of creation and destruction of exclusively reserved endpoints in the straightforward approach can be mitigated by using an endpoint pooling scheme. One advanced method of endpoint management with statistical multiplexing can also support the switch in a more efficient way in terms of space utilization.

The intra-cluster routing entity can have or access more information on the load conditions of the individual switch including, for example, current CPU load, current memory load, current disk load, etc., in addition to the set of information about connections. Then the final decision on connection allocation will be based on not only the availability of the connection but also the general computation load of the individual switch.

Content delivery networks (CDNs) typically have a large number of overlay nodes. Many nodes act like a Layer 4 switch is that they are neither an originator nor a terminator of Layer 4 connections. Upon this platform a large number of persistent connections may be carefully maintained in an effort to avoid the time overhead required to establish a new connection for each new web access.

FIG. 8 shows networks with end user clients 800 and web servers 802, where both cellular wireless-based and LAN-based clients access the web servers via the CDN infrastructure, which includes edge (child) nodes 804, and parent nodes 806. The CDN may also include an “extender” cluster of machines or processes 808 that are positioned with a provider's packet core network. The extender is an IP-addressable compute cluster inside provider access networks, where it mostly acts like a transparent proxy. As seen in the drawing, each network interface is identified as a location where the technique of this disclosure thus may be used in terms of persistent connection availability and utilization of persistent connections. First, the interface 810 between the access and CDN child edge networks can be designed with the switch herein between the extender cluster and the CDN child edge clusters. Second, the interface 812 between CDN edge and parent networks, although it is logical, can also adopt the switching approach herein as a new way for persistent connection management. Finally, the interface 814 between the CDN parent network and web server (or, more generally, an origin server tier) can be extended with the switching approach herein so that any parent cluster can maintain connections to all web servers. With the inter-cluster routing built into one or more of these interfaces, the unavailability of persistent connections between the extender and origin server will be close to non-existent. In case of cellular wireless clients, this means that the connection establishment overhead will only be required between the end user device and extender. The rest will effectively be always on. The approach is straightforward to implement in CDNs that use Layer 2 commercial switches (e.g., for internal back-end networks or otherwise to support machine-to-machine communications).

More generally, the techniques described herein are provided using a set of one or more computing-related entities (systems, machines, processes, programs, libraries, functions, or the like) that together facilitate or provide the described functionality described above. In a typical implementation, a representative machine on which the software executes comprises commodity hardware, an operating system, an application runtime environment, and a set of applications or processes and associated data, that provide the functionality of a given system or subsystem. As described, the functionality may be implemented in a standalone machine, or across a distributed set of machines.

While the above describes a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.

While the disclosed subject matter has been described in the context of a method or process, the subject disclosure also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including an optical disk, a CD-ROM, and a magnetic-optical disk, a read-only memory (ROM), a random access memory (RAM), a magnetic or optical card, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

While given components of the system have been described separately, one of ordinary skill will appreciate that some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like.

Preferably, the functionality is implemented in an operating system and/or application layer solution, although this is not a limitation.

There is no limitation on the type of computing entity that may implement the connection control functionality of this disclosure. Any computing entity (system, machine, device, program, process, utility, or the like) may provide this operation.

As noted above, the techniques herein may be implemented to facilitate content delivery over a mobile network.

Claims

1. Apparatus associated with an overlay network, the overlay network comprising a plurality of overlay nodes organized as edge nodes, parent nodes and other Internet Protocol (IP)-addressable nodes, the overlay network nodes being positioned between requesting client devices and content provider origin servers that utilize the overlay network nodes to thereby provide content and application delivery to the requesting client devices, comprising:

a set of switches organized into an interface, wherein each switch in the set of switches provides a group of ports that are dedicated to providing out-bound connections to given destinations, wherein the interface is positioned between one of: (IP)-addressable nodes and edge nodes, the edge nodes and the parent nodes, and the parent nodes and the content provider origin servers; and
a controller to control routing across the interface such that, as requesting client devices interact with content provider origin servers, a given connection to a destination in a particular switch is used by first and second in-bound connections;
wherein providing the interface with out-bound connections improves overlay network performance by reducing connection establishment overhead with respect to communications between the requesting client devices and content provider origin servers that traverse the overlay network.

2. The apparatus as described in claim 1 wherein first and second of the set of switches in the interface are interconnected with a Layer 2 switch.

3. The apparatus as described in claim 1 wherein IP-addressable nodes comprise a node cluster positioned within a wired core network.

4. The apparatus as described in claim 1 wherein the interface has a capacity that is proportional to a number of individual switches in the interface.

5. Apparatus associated with an overlay network using transport layer (Layer 4) switching, the overlay network comprising a plurality of overlay nodes organized as edge nodes and parent nodes, the overlay network nodes being positioned between requesting client devices and content provider origin servers that utilize the overlay network nodes to thereby provide content and application delivery to the requesting client devices, comprising:

a set of switches organized into a first interface and a second interface, wherein each switch in the set of switches provides a group of ports that are dedicated to providing out-bound connections to given destinations, wherein (a) the first interface is positioned between the IP-addressable nodes and the edge nodes, and (b) the second interface is positioned between the edge nodes and the parent nodes; and
a controller to control routing across the first and second interfaces such that, as requesting client devices interact with content provider origin servers, a given connection to a destination in a particular switch is used by first and second in-bound connections;
wherein providing the first and second interfaces each of which having out-bound connections improves overlay network performance by reducing connection establishment overhead with respect to communications between the requesting client devices and content provider origin servers that traverse the overlay network.

6. The apparatus as described in claim 5 further including a Layer 2 switch interconnecting first and second switches in at least one of the first and second interfaces.

7. The apparatus as described in claim 5 wherein at least one of the first and second interfaces has a capacity that is proportional to a number of individual switches therein.

8. The apparatus as described in claim 5 further including a third interface between requesting client devices and the edge nodes of the overlay network.

9. The apparatus as described in claim 8 wherein the third interface is positioned within a wired core network.

10. The apparatus as described in claim 5 wherein the client device is a mobile device.

11. The apparatus as described in claim 5 wherein the out-bound connections are persistent.

12. The apparatus as described in claim 1 wherein the out-bound connections are persistent.

Referenced Cited
U.S. Patent Documents
9479455 October 25, 2016 Chrysos
20130170503 July 4, 2013 Ooishi
20150029846 January 29, 2015 Liou
20150295858 October 15, 2015 Chrysos
20160173338 June 16, 2016 Wolting
20160191386 June 30, 2016 Swinkels
Patent History
Patent number: 10476771
Type: Grant
Filed: Jan 28, 2019
Date of Patent: Nov 12, 2019
Patent Publication Number: 20190158375
Assignee: Akamai Technologies, Inc. (Cambridge, MA)
Inventor: Byung K. Choi (Boston, MA)
Primary Examiner: Frantz B Jean
Application Number: 16/259,476
Classifications
Current U.S. Class: Bridge Or Gateway Between Networks (370/401)
International Classification: G06F 15/173 (20060101); H04L 12/26 (20060101); H04L 29/08 (20060101); H04L 29/12 (20060101);