RECONFIGURATION OF NODE OF FAT TREE NETWORK FOR DIFFERENTIATED SERVICES
A method is provided that is performed by a first node of a fat tree network for management of a plurality of traffic flows having differentiated service requirements. The method includes receiving, from a simulation model or a testbed environment, joint observations for the traffic flows. The traffic flows correspond to routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The method further includes identifying, with a first reinforcement learning model, a first action to take to reduce or prevent congestion at the first node. The first action includes a reconfiguration of the first node for an identified routing path. The method further includes outputting, to a controller node, the reconfiguration of the first node.
The present disclosure relates generally to methods for management of a plurality of traffic flows in a fat tree network having differentiated service requirements, and related methods and apparatuses.
BACKGROUNDDifferentiated service requirements envisioned in fifth generation (5G) communications, e.g., enhanced Mobile Broadband communications (eMBB), Ultra Reliable Low Latency Communications (URLLC), and massive Machine Type Communications (mMTC) highlight a need for efficient traffic management. For example, to provide robust end-to-end guarantees and coexistence on shared network resources, network slicing has been proposed. For network slicing to perform effectively, however, packet processing and traffic shaping at a datacenter network core may need to be optimally configured.
In datacenter networks, three tiers (e.g., edge, aggregation and core) have been shown to be sub-optimal for large scale datacenter routing, with inefficient endpoint connections, low resilience to failure, and multiple congestion possibilities. To relive these effects, fat tree topologies have been proposed that interconnect various nodes (e.g., routers, switches, and end-point servers) within datacenters. In some approaches, in order to provide load balancing and traffic shaping, an Equal Cost Multi-path Routing (ECMP) protocol alone is used, where next-hop packet forwarding can occur over multiple “best paths” in routing metric calculations.
A virtualized datacenter is a highly multifarious environment, shared among many co-located tenants (e.g., hundreds) hosting heterogeneous applications. A high degree of virtual machine consolidation may lead to diverse traffic dynamics with uneven traffic demands. For example, tenants' virtual machines can generate a subset of elephant or mouse flows that traverse the underlay fabric in aggregate, e.g., encapsulated in tunnelling protocols such as a VXLAN encapsulation protocol, a network virtualization using generic routing encapsulation (NVGRE) protocol, and stateless transport tunnelling (STT) protocol.
SUMMARYCertain aspects of the disclosure and their embodiments may provide solutions to these or other challenges.
Various embodiments of the present disclosure, provide a method performed by a first node of a fat tree network for management of a plurality of traffic flows in the fat tree network having differentiated service requirements. The method includes receiving, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows. The plurality of traffic flows correspond to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The method further includes identifying, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations. The first action comprises a reconfiguration of the first node for an identified routing path. The method further includes outputting, to a controller node, the reconfiguration of the first node.
In some embodiments, the method further includes receiving, from the simulation model or the testbed environment, a plurality of global reward values. A global reward value indicates a measure of a joint state of the nodes in the fat tree network in a routing path comprising a combination of the at least one leaf node, the at least one spine node, and the at least one super spine node. The joint state results from an action of at least one reinforcement learning agent in the fat tree network for the routing path.
In other embodiments, a first node of a fat tree network is provided for management of a plurality of traffic flows in the fat tree network having differentiated service requirements. The first node includes at least one processor; and at least one memory connected to the at least one processor and storing program code that is executed by the at least one processor to perform operations. The operations include receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows. The plurality of traffic flows correspond to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The operations further include identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations. The first action comprises a reconfiguration of the first node for an identified routing path. The operations further include outputting, to a controller node, the reconfiguration of the first node.
In other embodiments, a first node of a fat tree network is provided for management of a plurality of traffic flows in the fat tree network having differentiated service requirements, the first node adapted to per operations. The operations include receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows. The plurality of traffic flows correspond to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The operations further include identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations. The first action comprises a reconfiguration of the first node for an identified routing path. The operations further include outputting, to a controller node, the reconfiguration of the first node.
In other embodiments, a computer program comprising program code to be executed by processing circuitry of a first node of a fat tree network is provided for management of a plurality of traffic flows in the fat tree network having differentiated service requirements. Execution of the program code causes the first node to perform operations. The operations include receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows. The plurality of traffic flows correspond to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The operations further include identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations. The first action comprises a reconfiguration of the first node for an identified routing path. The operations further include outputting, to a controller node, the reconfiguration of the first node.
In other embodiments, a computer program product comprising a non-transitory storage medium including program code to be executed by processing circuitry of a first node of a fat tree network is provided for management of a plurality of traffic flows in the fat tree network having differentiated service requirements. Execution of the program code causes the first node to perform operations. The operations include receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows. The plurality of traffic flows correspond to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path. The operations further include identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations. The first action comprises a reconfiguration of the first node for an identified routing path. The operations further include outputting, to a controller node, the reconfiguration of the first node.
Certain embodiments may provide one or more of the following technical advantages. By providing joint observations from a simulation model or testbed environment for the traffic flows corresponding to a plurality of routing paths that include different combinations of leaf, spine, and/or super spine nodes, reconfiguration of one or more nodes of fat tree networks may be provided to reduce or prevent congestion and provide differentiated services. In some embodiments, the joint observations are based on an action(s) of at least one additional reinforcement learning (RL) agent in the fat tree network. As a consequence of having multiple (“multi”) RL agents, the method of the present disclosure may provide a scalable multi-agent RL formulation that can demonstrate congestions and bottlenecks in the fat tree network due to interaction between traffic flows having differentiated service requirements (e.g., elephant and mouse flows) that existing approaches (e.g., ECMP alone) do not resolve. For example, in a case study of the method of the present disclosure discussed further herein, a >30% improvement in throughput and latency over ECMP alone was demonstrated.
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiments of inventive concepts. In the drawings:
Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter. For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.
Potential problems exist with managing traffic flow in a fat tree network when the traffic flow includes traffic having differentiated service requirements. For example, an “elephant flow” refers to long-lived and bandwidth intensive traffic flow in comparison with “mouse flow”, which refers to a shorter-lived less bandwidth intensive traffic flow. A mouse flow also may be latency-sensitive and highly bursty in nature. Both type of flows require different treatment from underlay fabric, but encapsulation can obfuscate the overlay traffic characteristics and demands.
Existing approaches such as ECMP alone have been employed in, e.g., data centers. Such approaches, however, may be either agnostic to elephant and mouse flows or may not have visibility into virtual traffic which may be used to precisely detect, isolate, and treat elephant flows differently than mouse flows. If elephant flows are not identified and addressed in aggregated virtual traffic, the elephant flows may affect mouse flows generated from co-located applications and, thus, degrade application performance of co-located tenants.
Such potential problems may be further affected by a dynamic mix of traffic flow among virtual machines within a datacenter (referred to herein as “East-West” traffic) and traffic flow that either enters/leaves a datacenter (referred to herein as “North-South” traffic). Traffic can skew towards a particular direction and, thus, monitoring, and optimal configuration may be needed. In addition, due to varying link capacities and problems of shallow buffers versus deep buffers, analysis of trade-offs between buffer size, latency, and packet drop rates may be needed.
Various embodiments of the present disclosure may provide potential technical advantages over such approaches by including multi-agent reinforcement learning (RL) processes (referred to herein as “MALTA”) to configure a fat tree network(s). The RL agents can be intelligent and, in some embodiment, can be specifically developed over a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) paradigm. Such Dec-POMDP agents can be used to dynamically reconfigure nodes (e.g., switches/routers) at leaf, spine, and/or super-spine level of the fat tree network, which may help ensure optimal network utilization. Side-effects of changing parameters at one level can be analyzed, which can result in coordinated behavior between agents. The method of various embodiments has been contrasted against, e.g., ECMP alone for a case study involving virtual network functions with a combination of elephant and mouse flows. As discussed further herein, in the case study, MALTA provided 46% latency improvement and 34% throughput improvement over ECMP.
Another potential advantage of various embodiments of the present disclosure may be that by providing joint observations from a simulation model or a testbed environment that considers analysis of multi-agent RL processes (e.g., dec-POMDP agents) for queue prioritization and traffic shaping of the traffic flows at the leaf, spine, and/or super spine node levels, superior differentiated services may result. For example, based on reconfiguration of a node(s) in a fat tree network for a datacenter, superior differentiated service performance may result.
Still referring to
Still referring to
Simulation model 113 outputs differentiated flow performance 115. For example, a combination of elephant and mouse flows in traffic patterns 111 results in differentiated service performance for each flow (e.g., for inter-datacenter flow versus intra-datacenter flow). A bottleneck at a particular node in the fat tree network can potentially affect the throughput, latency, and packet drop of each service. While techniques such as ECMP alone cannot handle differentiated service performance, as a consequence if multi-agent RL 101, the method of the present disclosure can handle differentiated service performance.
Still referring to
Individual levels of agents at the leaf, spine, and super spine levels permits granular, level specific changes while aiding in scalable deployments. In some embodiments, output configurations 117 from the leaf, spine, and/or super spine agent nodes 103, 105, 107 is used as input to simulation model 113 to reconfigure the fat tree network and derive positive joint rewards 121 towards improvement of the system.
In some embodiments, the leaf, spine, and super spine agent nodes 103, 105, and/or 107 perform at least the following: (i) Identify variations in traffic patterns and take appropriate actions to prevent or reduce congestions as a result of interaction between different traffic flows, e.g., elephant and mouse flows; and (ii) Provide improved performance for differentiated services due to efficient setting of configurations at the leaf, spine, and super-spine levels.
Based on inclusion of reconfiguration of at least one leaf node 103, spine node 105, and/or super spine node 107, technical advantages provided by embodiments of the present disclosure may further include scalability to provide differentiated service to each flow, which is needed, for example, in 5G slicing. The combination of elephant and mouse flows can degrade performance, and in contrast with the method of the present disclosure can handle differentiated flow performance, conventional techniques (e.g., ECMP alone) cannot handle differentiated flow performance.
Fat tree topology 109 will now be discussed further.
Still referring to the example embodiment of
A simulation network (e.g., a queueing network model) of a fat tree network will now be discussed further.
The example queuing network model in
The outputs of the queueing network model illustrated in
Multi agent reinforcement learning will now be discussed further. While single agent reinforcement learning solutions may be used in some scenarios, for larger scale applications with heterogeneous action spaces and local observations, multi-agent deployments are useful. Advantages of multi-agent reinforcement learning include, without limitation, RL agents need to only search through limited action spaces and can benefit from shared experiences and coordination; faster processing may be made possible due to parallel computation; multi-agent RL may allow easy insertion of new RL agents into the system, leading to a high degree of scalability; and, when one or more RL agents fail in a multi-agent RL system, the remaining RL agents can take over some of their tasks. Such scalable deployments of multiple RL agents may be particularly beneficial in larger datacenters (e.g., with hundreds of servers, tens of super-spine nodes, and hundreds of leaf and spine nodes). It is noted the RL agents do not have centralized control but, rather, coordinate configurations to achieve improved or optimal performance.
In some embodiments, the RL agents use a decentralized partially observable Markov Decision Process (Dec-POMDP). The Dec-POMDP includes a team of RL agents that collaborate to maximize a global reward based on local information.
In some embodiments, a decentralized partially observable MDP is a tuple I,S,{Ai},P,{Ωi},O,R,h, where:
-
- I is a finite set of agents indexed 1, . . . , n.
- S is a finite set of states, with distinguished initial state so.
- Ai is a finite set of actions available to agent I, and Â=×i∈IAi is the set of joint actions.
- P: S×Â→ΔS is a Markovian transition function. P(s′|s,â) denotes the probability that after taking joint action a in state s a transition to state s′ occurs.
- Ωi is a finite set of observations available to agent I, and {circumflex over (Ω)}=×i∈IΩi is the set of joint observations.
- O: Â×S→ΔΩ is an observation function. O(ô|â,s′) denotes the probability of observing joint observation ô given the joint action â led to state s′.
- R: Â×S→R is a reward function. R(â,s′) denotes the reward obtained after joint action a was taken and a state transition to s′ occurred.
- If the Dec-POMDP has a finite horizon, that horizon is represented by positive integer h.
In order to develop RL agents to configure and provide dedicated service within a fat tree network, in some embodiments, a Multi-Agent Decision Process (MADP) toolbox is used. See e.g., Frans A. Oliehoek, Matthew T. J. Spann, Bas Terwijn, Philipp Robbel, João V. Messias, “The MADP Toolbox: An Open Source Library for Planning and Learning in (Multi-) Agent Systems”, Journal of Machine Learning Research 18 (2017) I-5, https://www.imir.org/papers/volume18/17-156/17-156.pdf (accessed on 15 Nov. 2021), which is hereby incorporated by reference in full. The toolbox can provide a specified format to solve dec-POMDP problems with inbuilt solvers such as Generalized Multiagent A* (GMAA) and Joint Equilibrium-based Search for Policies (JESP). GMAA can make use of variants of heuristic search that use collaborative Bayesian games to represent one-stage node expansion problems. JESP can perform alternating maximization in the space of entire policies. JESP can fix a set of policies and can optimize the policy of each RL agent through dynamic programming.
With reference to
-
- Agents: leaf spine superspine (ss or sspine)
- Joint states:
- leaf_green_spine1_green_spine4_green_ss_green
- leaf_green_spine1_red_spine4_green_ss_green
- leaf_red_spine1_green_spine4_green_ss_green
- leaf_red_spine1_green_spine4_green_ss_green
- Actions:
- Agent 1: ECMP decrease_flow set_priority0 RED_drop_set
- Agent 2: ECMP load_balance
- Agent 3: ECMP load_balance
- Observations:
- Agent 1: throughput_change_latency_change_leaf
- Agent 2: throughput_change_latency_change_spine
- Agent 3: throughput_change_latency_change_sspine
In the above example embodiment of joint states, actions, and observations, three RL agents are used at the leaf, spine, and super spine levels. The joint states are specified on utilization of queues where,
In some embodiments, each of the RL agents also have specific action spaces. While the RL agents at the spine and super-spine levels may make use of ECMP or intelligent load balancing, the leaf agents can have other configurations, such as decreasing flow rate, changing priority of flows, and increasing packet drop rates. The observation spaces for these RL agents include, without limitation, the throughput and latencies of the links connected within layers.
Continuing with the above example embodiment, in order to derive the transition and observation probabilities needed within the MADP model, multiple configurations in the queuing network model in
With reference to the above table, it is noted that ECMP is integrated as an option to compare with other configuration changes such as load balancing, RED drop, and flow priority change. In the example embodiment, the configurations are input along with multiple elephant and mouse flows (e.g., varying arrival rates, priorities, traffic types) and an output is produced as illustrated in
Continuing with the example embodiment, values from
-
- Transitions: action per agent: start state: end state: probability
- T: decrease_flow ECMP ECMP: *: leaf_green_spine1_red_spine4_green_ss_green: 1.0
- T: decrease_flow ECMP load_balance: *: leaf_green_spine1_red_spine4_green_ss_green: 1.0
- T: decrease_flow load_balance ECMP: *:
- leaf_green_spine1_green_spine4_green_ss_green: 1.0
- T: decrease_flow load_balance load_balance: *:
- leaf_green_spine1_green_spine4_green_ss_green: 1.0
- T: set_priority0 ECMP ECMP: *: leaf_green_spine1_red_spine4_green_ss_green: 1.0
- T: set_priority0 ECMP load_balance: *: leaf_green_spine1_red_spine4_green_ss_green: 1.0
- T: set_priority0 load_balance ECMP: *: leaf_green_spine1_green_spine4_green_ss_green: 1.0
- Observations: action per agent: start state: joint observation: probability
- O: decrease_flow ECMP load_balance: *: throughput_down_leaf_latency_up throughput_down_spine_latency_down throughput_down_ss_latency_down: 1.0
- O: set_priority0 ECMP load_balance: *: throughput_down_leaf_latency_drop: throughput_down_spine_latency_down throughput_down_ss_latency_down: 1.0
- O: RED_drop_set ECMP load_balance: *: throughput_down_leaf_latency_drop: throughput_down_spine_latency_down throughput_down_ss_latency_down: 1.0
- O: decrease_flow load_balance load_balance: *: throughput_up_leaf_latency_drop throughput_up_spine_latency_down throughput_up_ss_latency_down: 1.0
- O: set_priority0 load_balance load_balance: *: throughput_up_leaf_latency_drop throughput_up_spine_latency_down throughput_up_ss_latency_down: 1.0
- O: RED_drop_set load_balance load_balance: *: throughput_up_leaf_latency_drop throughput_up_spine_latency_down throughput_up_ss_latency_down: 1.0
Additionally, continuing with the example embodiment, a combination of joint actions that lead to a state or observation change is provided with a reward value formulated as follows:
which rewards for higher throughput performance of a traffic flow while minimizing the number of high utilization nodes. As a consequence, not only is load balancing optimized, but differentiated services are also provided.
-
- Example reward values include as follows:
- Rewards (R): Action per agent: start state: end state: observations: reward value
- R: decrease_flow * *: *: *: *: 218
- R: set_priority0 * *: *: *: *: 220
- R: RED_drop_set * *: *: *: *: 199
- R: * load_balance *: *: *: *: 408
- R: * * load_balance: *: *: *: 207
The above reward structure is an example and embodiments of the present disclosure are not limited to this structure. Rather, an advantage of RL is the ability to change the reward structure dependent on the intents. Thus, the above reward structure can be modified to generate a variety of alternate policies to be deployed on fat tree networks.
Controller integration will now be discussed further. The following table provides example embodiments on how the RL agents can be integrated within SDN, collaborative computing frameworks (CCF), or application centric infrastructure (ACI) architectures:
RL agent state and action space will now be discussed further. An example embodiment of global states is as follows:
While example embodiments are explained herein with one RL agent each for the leaf, spine super-spine levels, the present disclosure is not so limited. Rather, the RL agents may be expanded to configure only a subset of nodes at individual hierarchies (e.g., have multiple peer leaf, spine, super-spine agents).
An example embodiment of states, actions, observations, and rewards for a super spine agent are as follows:
An example embodiment of states, actions, observations, and rewards for a spine agent are as follows:
An example embodiment of states, actions, observations, and rewards for a leaf agent are as follows:
Continuing with the example embodiment, traffic shaping and load balancing are now discussed further.
Performance of the method of the present disclosure including multi-agent RL was analyzed in comparison to ECMP alone using the combination of East-West and North-South traffic shown in
Policy 1 includes action observation interactions of leaf agent for leaf node 103 are illustrated in
Policy 2 includes action observation interactions of spine agent for spine node 105 are illustrated in
Policy 3 includes action observation interactions of super spine agent for super spine node 107 are illustrated in
Still referring to
When input to the queuing network model in
As discussed herein, a potential technical advantage of making use of multi-agent RL techniques may be the ability to provide superior services for network slices. In another example embodiment, improvement of a particular flow of the data center of
Architectural frameworks and use cases for the method of the present disclosure will now be discussed further.
Example use cases for the method of the present disclosure include, without limitation, the following use cases within data center networking. A first example use case involves noisy neighbors. “Noisy neighbor” is a phrase that may be used to describe a data center infrastructure co-tenant that monopolizes bandwidth, disk inputs/outputs (I/O), central processing units (CPU), and other resources, and may negatively affect other users' performance. A noisy neighbor effect can occur when an application or virtual machine uses the majority of available resources and causes network performance issues for others on the shared infrastructure. A lack of bandwidth can be one cause of network performance issues. Bandwidth carries data throughout a network, so when one application or instance uses too much, other applications may suffer from slow speeds or latency. In some embodiments, through use of the method of the present disclosure including multi-agent RL, the noisy neighbour(s) can be identified and placed in appropriate locations or provided appropriate weights to reduce the effect on other pods.
Another example use case involves multi-chassis lag grouping. A multi-chassis link aggregation group is a type of link aggregation group (LAG) with constituent ports that terminate on separate chassis, primarily for the purpose of providing redundancy in the event one of the chassis fails. A LAG is a method of inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy.
In another example use case, workload can be re-engineered. Proper placement of pods is considered to make use of the fat tree network (e.g., optimal use). Due to traffic mix changes or improper pod placement, bottlenecks can occur at multiple links at the leaf, spine, and/or super-spine levels.
Still referring to
While example embodiments herein are explained with reference to one leaf node, one spine node, and/or one super spine node at which (or for which) there is a respective leaf agent, spine agent, and super spine agent, the method of the present disclosure is not so limited. Rather, the agents are scalable in deployment and can include any number of leaf, spine, and/or super spine agents. Additionally, while example embodiments herein are explained with reference to a leaf agent, a spine agent, and/or a super spine agent at a leaf node, a spine node, and/or a super spine node performing policy computations, the method of the present disclosure is not so limited. Rather, policy computations may be performed in the cloud with an SDN, or other, controller deploying and/or monitoring agent policies.
As discussed herein, operations of the network node according to some embodiments may be performed by processing circuitry 1403, network interface 1407, optional memory (as discussed herein), and/or RL agent 1411 (e.g., operations discussed herein with respect to example embodiments relating to first nodes). For example, processing circuitry 1403 and/or RL agent 1411 may control network interface 1407 to signal communications through network interface 1407 to one or more other nodes, controllers, and/or simulation nodes and/or to receive uplink communications through network interface 1407 from one or more other nodes, controllers, and/or simulation nodes. According to some embodiments, first node 1400 and/or an element(s)/function(s) thereof may be embodied as a virtual node/nodes and/or a virtual machine/machines.
In the description that follows, while the first node may be any of a leaf node, a spine node, a super spine node, a virtual node, or a virtual machine, the first node 1400 shall be used to describe the functionality of the operations of the first node. Operations of a first node 1400 (implemented using the structure of the block diagram of
Referring first to
In some embodiments, the plurality of traffic flows comprise an elephant flow and a mouse flow. The elephant flow and the mouse flow comprise respective traffic flows having different arrival rates, different priorities, and different traffic types.
In some embodiments, the plurality of joint observations comprise at least one of a latency per traffic flow and a throughput per traffic flow.
Referring now to
In some embodiments, the joint state comprises a utilization metric per the at least one leaf node, the at least one spine node, and the at least one super spine node for the routing path.
In some embodiments, the global reward value comprises at least one of (i) a positive value when the routing path meets a service level agreement, SLA, target for a defined priority level of service for the fat tree network, (ii) a positive value when the routing path is energy efficient based on a reduction in a number of active nodes in the routing path, and (iii) a positive value when the routing path is within a defined fault tolerance for the traffic flow.
In some embodiments, the policy comprises a proposed reconfiguration of the first node by the reinforcement learning agent per state in a set of states and an observation per state that maximizes a reward value to the reinforcement learning agent. The observation comprises at least one of (i) a per traffic flow throughput increase or decrease at the first node, (ii) an amount of time a packet per traffic flow spent at the first node, (iii) an increase or a decrease of packet delay per traffic flow at the first node, (iv) a per traffic flow packet drop increase or packet drop decrease at the first node, (v) a retransmission at the first node, (vi) an outage of a link to the first node in the fat tree network, and (vii) a reliability of the first node.
In some embodiments, the reconfiguration of the first node comprises at least one of a first reconfiguration to a load balance the traffic flow at the first node, a second configuration to a shape the traffic flow at the first node, and a third configuration to a prioritize the traffic flow at the first node.
In some embodiments, the reconfiguration comprises performance of at least one of (i) an equal cost multi-path routing load balancing, (ii) a priority queue scheduling at the first node, (iii) a first in first out, FIFO, queue scheduling at the first node, (iv) dropping a packet according to a defined metric at the first node, and (v) limiting a processing rate of a traffic flow at the first node.
In some embodiments, when the first node comprises a super spine node or a spine node, the reconfiguration further comprises diverting the traffic flow to a node in the routing path having lower utilization than the super spine node or the spine node based on an adaptive change to a weight assigned to the super spine node or the spine node.
In some embodiments, when the first node comprises a leaf node, the reconfiguration further comprises limiting a committed information rate, CIR, and/or a peak information rate, PIR, per traffic flow.
In some embodiments, the reward value indicates a measure of the state of the first node resulting from the proposed action.
In some embodiments, the reward value comprises at least one of a positive value for a reduced packet drop or latency per traffic flow, a positive value for an improved throughput, a positive value for not crossing a defined utilization metric of the first node, and a combination of the global reward values.
In some embodiments, when the first node comprises a spine node, the reward value further comprises a negative value for an outage of a link to the spine node in the fat tree network.
In some embodiments, the plurality of reinforcement learning agents comprise decentralized partially observable Markov Decision Process, Dec-POMDP, agents.
In some embodiments, the simulation model or test bed environment receives the plurality of traffic flows and a plurality of configurations per reinforcement learning agent serving the at least one leaf node, the at least one spine node, and the at least one super spine node.
In some embodiments, the simulation model or testbed environment evaluates an impact per traffic flow from simulating a configuration from a plurality of configurations of the at least one of the leaf node, the at least one spine node, and the at least one super spine node per routing path.
The operations of block 1601 from the flow chart of
RL agents 1411a and 1411b (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.
Hardware 1701 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 1703 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide RL agents 1411a and/or 1411b (one or more of which may be generally referred to as RL agents 1411), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 1703 may present a virtual operating platform that appears like networking hardware to the RL agents 1411.
The RL agents 1411 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 1703. Different embodiments of the instance of a virtual appliance 1705 may be implemented on one or more of RL agents 1411, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment.
In the context of NFV, a RL agents 1411 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the RL agents 1411, and that part of hardware 1701 that executes that RL agent, be it hardware dedicated to that RL agent and/or hardware shared by that RL agent with others of the RL agents, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more RL agents 1411 on top of the hardware 1701 and corresponds to the application 1705.
Hardware 1701 may be implemented in a standalone network node with generic or specific components. Hardware 1701 may implement some functions via virtualization. Alternatively, hardware 1701 may be part of a larger cluster of hardware (e.g. such as in a data center) where many hardware nodes work together and are managed via management and orchestration 1707, which, among others, oversees lifecycle management of applications 1705. In some embodiments, hardware 1701 is coupled to one or more nodes of a fat tree network. Nodes may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with capabilities of embodiments of first node discussed herein. In some embodiments, some signaling can be provided with the use of a control system 1707.
In the above description of various embodiments of the present disclosure, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
When an element is referred to as being “connected”, “coupled”, “responsive”, or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected”, “directly coupled”, “directly responsive”, or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, “coupled”, “connected”, “responsive”, or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus, a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.
As used herein, the terms “comprise”, “comprising”, “comprises”, “include”, “including”, “includes”, “have”, “has”, “having”, or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation “e.g.”, which derives from the Latin phrase “exempli gratia,” may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation “i.e.”, which derives from the Latin phrase “id est,” may be used to specify a particular item from a more general recitation.
Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as “circuitry,” “a module” or variants thereof.
It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts is to be determined by the broadest permissible interpretation of the present disclosure including the examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
Claims
1. A method performed by a first node of a fat tree network for management of a plurality of traffic flows in the fat tree network having differentiated service requirements, the method comprising:
- receiving, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows, the plurality of traffic flows corresponding to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path;
- identifying, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations, the first action comprising a reconfiguration of the first node for an identified routing path; and
- outputting, to a controller node, the reconfiguration of the first node.
2. The method of claim 1, wherein the plurality of traffic flows comprise an elephant flow and a mouse flow, the elephant flow and the mouse flow comprising respective traffic flows having different arrival rates, different priorities, and different traffic types.
3. The method of claim 1, wherein the plurality of joint observations comprise at least one of a latency per traffic flow and a throughput per traffic flow.
4. The method of claim 1, further comprising:
- receiving, from the simulation model or the testbed environment, a plurality of global reward values, wherein a global reward value indicates a measure of a joint state of the nodes in the fat tree network in a routing path comprising a combination of the at least one leaf node, the at least one spine node, and the at least one super spine node, the joint state resulting from an action of at least one reinforcement learning agent in the fat tree network for the routing path.
5. The method of claim 4, wherein the joint state comprises a utilization metric per the at least one leaf node, the at least one spine node, and the at least one super spine node for the routing path.
6. The method of claim 4, wherein the global reward value comprises at least one of (i) a positive value when the routing path meets a service level agreement (SLA), target for a defined priority level of service for the fat tree network, (ii) a positive value when the routing path is energy efficient based on a reduction in a number of active nodes in the routing path, and (iii) a positive value when the routing path is within a defined fault tolerance for the traffic flow.
7. The method of claim 1, wherein the policy comprises a proposed reconfiguration of the first node by the reinforcement learning agent per state in a set of states and an observation per state that maximizes a reward value to the reinforcement learning agent, and
- wherein the observation comprises at least one of (i) a per traffic flow throughput increase or decrease at the first node, (ii) an amount of time a packet per traffic flow spent at the first node, (iii) an increase or a decrease of packet delay per traffic flow at the first node, (iv) a per traffic flow packet drop increase or packet drop decrease at the first node, (v) a retransmission at the first node, (vi) an outage of a link to the first node in the fat tree network, and (vii) a reliability of the first node.
8. The method of claim 1, wherein the reconfiguration of the first node comprises at least one of a first reconfiguration to a load balance the traffic flow at the first node, a second configuration to a shape the traffic flow at the first node, and a third configuration to a prioritize the traffic flow at the first node.
9. The method of claim 8, wherein the reconfiguration comprises performance of at least one of (i) an equal cost multi-path routing load balancing, (ii) a priority queue scheduling at the first node, (iii) a first in first out (FIFO) queue scheduling at the first node, (iv) dropping a packet according to a defined metric at the first node, and (v) limiting a processing rate of a traffic flow at the first node.
10. The method of claim 9, wherein when the first node comprises a super spine node or a spine node, the reconfiguration further comprises diverting the traffic flow to a node in the routing path having lower utilization than the super spine node or the spine node based on an adaptive change to a weight assigned to the super spine node or the spine node.
11. The method of claim 9, wherein when the first node comprises a leaf node, the reconfiguration further comprises limiting a committed information rate (CIR) and/or a peak information rate (PIR) per traffic flow.
12. The method of claim 7, wherein the reward value indicates a measure of the state of the first node resulting from the proposed action.
13. The method of claim 12, wherein the reward value comprises at least one of a positive value for a reduced packet drop or latency per traffic flow, a positive value for an improved throughput, a positive value for not crossing a defined utilization metric of the first node, and a combination of the global reward values.
14. The method of claim 12, wherein when the first node comprises a spine node, the reward value further comprises a negative value for an outage of a link to the spine node in the fat tree network.
15. The method of claim 1, wherein the plurality of reinforcement learning agents comprise decentralized partially observable Markov Decision Process (Dec-POMDP) agents.
16. The method of claim 1, wherein the simulation model or test bed environment receives the plurality of traffic flows and a plurality of configurations per reinforcement learning agent serving the at least one leaf node, the at least one spine node, and the at least one super spine node.
17. The method of claim 1, wherein the simulation model or testbed environment evaluates an impact per traffic flow from simulating a configuration from a plurality of configurations of the at least one of the leaf node, the at least one spine node, and the at least one super spine node per routing path.
18. A first node of a fat tree network for management of a plurality of traffic flows in the fat tree network having differentiated service requirements, the first node comprising:
- at least one processor;
- at least one memory connected to the at least one processor and storing program code that is executed by the at least one processor to perform operations comprising:
- receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows, the plurality of traffic flows corresponding to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path;
- identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations, the first action comprising a reconfiguration of the first node for an identified routing path; and
- output, to a controller node, the reconfiguration of the first node.
19-21. (canceled)
22. A computer program comprising program code to be executed by processing circuitry of a first node of a fat tree network for management of a plurality of traffic flows in the fat tree network having differentiated service requirements, whereby execution of the program code causes the first node to perform operations comprising:
- receive, from a simulation model or a testbed environment, a plurality of joint observations for the plurality of traffic flows, the plurality of traffic flows corresponding to a plurality of routing paths comprising different combinations of at least one leaf node, at least one spine node, and at least one super spine node of the fat tree network per routing path;
- identify, with a first reinforcement learning model for the first node, a first action to take to reduce or prevent congestion at the first node of a traffic flow based on a policy generated from at least the plurality of joint observations, the first action comprising a reconfiguration of the first node for an identified routing path; and
- output, to a controller node, the reconfiguration of the first node.
23-25. (canceled)
Type: Application
Filed: Dec 10, 2021
Publication Date: Feb 13, 2025
Inventors: Ajay KATTEPUR (BANGALORE), Sushanth S DAVID (Frisco, TX)
Application Number: 18/718,261