METHOD TO SCHEDULE MULTIPLE TRAFFIC FLOWS THROUGH PACKETSWITCHED ROUTERS WITH NEARMINIMAL QUEUE SIZES
A method to schedule multiple traffic flows through a multiplexer server to provide fairness while minimizing the sizes of the associated queues, is proposed. The multiplexer server minimizes a quantity called the maximum Normalized Service Lag for each traffic flow. In each timeslot, the normalized service lag of every traffic flow may be updated by adding the normalized lag increment value, whether or not there is a packet in the queue associated with the flow. In each timeslot, a multiplexer server selects a traffic flow to service with an available packet and with the maximum normalized service lag. When the traffic rate requested by each traffic flow is stable, the multiplexer server schedule may repeat periodically. Efficient methods to compute periodic schedules are proposed. The methods can be applied to packetswitched Internet routers to achieve reduced queue sizes and delay.
This application is a continuation of U.S. application Ser. No. 12/861,455 filed Aug. 23, 2010, entitled “METHOD TO SCHEDULE MULTIPLE TRAFFIC FLOWS THROUGH PACKETSWITCHED ROUTERS WITH NEARMINIMAL QUEUE SIZES”, listing Ted H. Szymanski as the inventor which claims priority from U.S. provisional application No. 61/235,875 filed on Aug. 21, 2009, each of the applications is being incorporated herein by reference.
BACKGROUND OF THE INVENTIONThe present invention relates to the scheduling of traffic flows in the routers of a telecommunications network.
DESCRIPTION OF THE PRIORART The GPS/WFQ Scheduling Algorithm (Prior Art)The weftknown ‘Generalized Processor Sharing’ (GPS) server scheduling algorithm is often used in the Internet network to provide fairness guarantees. The algorithm is typically used to schedule multiple traffic flows which pass through one multiplexer server onto one transmission link, to provide fairness guarantees to all competing traffic flows. The GPS algorithm has a discretetime implementation called the ‘Weighted Fair Queueing’ (WFQ) algorithm. The GPSWFQ algorithms were formalized by Dr. Parekh in his PhD. thesis at MIT, and were published by Parekh and Gallager in 1991. The GPS/WFQ algorithms can provide deterministic guarantees on the endtoend delay experienced by packets traversing one path through a packetswitched network such as the Internet, under certain restrictive assumptions.
The paper by A. K. Parekh and R. G. Gallager, entitled “A Generalized Processor Sharing Approach to Flow Control in Integrated Service Networks: the Single Node Case”, IEEE/ACM Trans. Networking, vol. 1, pp. 344357, 1993 is incorporated by reference. A second paper by the same authors entitled “A Generalized Processor Sharing Approach to Flow Control in Integrated Service Networks: the Multiple Node Case”, IEEE/ACM Trans. Networking, vol. 2, no. 2, pp. 137150, 1994 is incorporated by reference. These 2 papers are hereafter referred to as the GPSpapers.
Consider a multiplexer server which services N traffic flows contending for access to one outgoing transmission link. The multiplexer server is also called the ‘server’. The GPS algorithm can be used to schedule the N traffic flows as they pass through the multiplexer server onto the transmission line. The GPS algorithm assumes an idealized multiplexer server which uses a ‘fluid’ model of data packets, where a packet can be subdivided so that infinitesimally small amounts of each packet can be served. The idealized server visits all N traffic flows in a roundrobin order in each ‘round’ of service. It serves a very small amount of each queued packet for each traffic flow in each round, thereby providing relatively ‘smooth’ service to each traffic flow. Once the last bit of a packet is served by the idealized server, the packet is labeled as ‘served’, its departure time is recorded, and it is removed from the system. The GPS algorithm can be used to compute a departure time for all packets passing through the one multiplexer server onto the one outgoing transmission link, and it has been rigorously proved to achieve fairness for all traffic flows.
The departure schedule determined by the idealized server using the GPS scheduling algorithm can be used by a real multiplexer. The real multiplexer server can be called the Weighted Fair Queuing (WFQ) server. In the WFQ server packets are nondivisible entities. The WFQ server services packets in the same order as the ideal GPS server, thereby achieving a level of fairness in the service to each flow.
The GPSWFQ scheduling algorithms are currently used to provide fairness guarantees in packetswitched Internet routers. We demonstrate that the GPSWFQ algorithms do not minimize the queue sizes for the associated traffic flows. We propose a new server scheduling algorithm called the MaximumNormalizedLagFirst (MNLF) scheduling algorithm. The proposed algorithm can be used to schedule the N traffic flows to pass through one multiplexer server onto one transmission line, to provide fairness guarantees, and to minimize the size of the associated queues.
The primary difficulty with the GPSWFQ scheduling algorithm is that it does not minimize the queue sizes of traffic flows passing through a multiplexer. This patent application illustrates that an important consideration of a scheduling algorithm is the maximum normalized service lag which it can guarantee. The service lag of a traffic flow can be defined as the number of bytes of data behind service a flow has fallen, when compared to the ideal or perfect schedule for that traffic flow. The normalized service lag of a traffic flow can be defined as the number of average size packets behind service a flow has fallen, when compared to the ideal or perfect schedule for that traffic flow, assuming that all traffic flows use the same average packet size. The normalized lag can also be expressed as a time delay. It will be established in this document that the GPSWFQ algorithm does not minimize the maximum normalized service lag of all traffic flows. Therefore, the GPSWFQ algorithm does not minimize the queue sizes of the traffic flows.
It has recently been established in theory that the maximum amount of data stored in a queue is bounded by 2 values: (a) the maximum normalized service lag of the service schedule for the queue, and (b) the maximum normalized service lag of the incoming traffic. Therefore, a server scheduling algorithm with an unnecessarily large maximum normalized service lag will have larger queues than necessary. This observation is important, since the GPSWFQ algorithms are often used to provide fair service in Internet routers, and they do not minimize the normalized service lag. Therefore, in principle the size of many queues in the Internet may be reduced by replacing the GPSWFQ algorithms with another algorithm which minimizes the maximum normalized service lag. The proposed MNLF algorithm can achieve these goals.
OBJECTS AND ADVANTAGESAccordingly, it is desirable to find a new multiplexer server scheduling algorithm which can minimize the queue sizes for all traffic flows passing through the multiplexer server.
The proposed MNLF algorithm achieves nearminimal queue sizes for the traffic flows. In many applications, the size of the queues can be reduced by replacing the GPS/WFQ algorithms by the proposed MNLF algorithm.
The proposed MNLF algorithm is easy to implement in hardware or software. The amount of computation is limited, and hardware circuits which compute schedules should easily handle the highest link speeds, i.e., 10 Gbps, 40 Gbps, 160 Gbps, 640 Gbps, etc.
The proposed MNLF algorithm is iterative. In each timeslot a few simple calculations are performed and the packet to service is identified. The proposed MNLF algorithm can use variablesized packets or fixedsized packets.
Consider traffic flows with ‘Guaranteed Rates’ which do not change over an extended period of time. GuaranteedRate traffic flows will be denoted GR traffic flows in this document. When GR traffic flows are scheduled through one multiplexer server, the computed server schedules will repeat after some duration of time, which can be called a ‘scheduling frame’. A scheduling frame consists of F timeslots, for some integer F. It is desirable to avoid recomputing the same server schedules repeatedly when the traffic rates do not change. A method by which the computed server schedule for a scheduling frame is computed and stored, and reused in subsequent scheduling frames, is proposed. This approach will reduce power consumption in the hardware, and it will also allow a control processor the opportunity to download other schedules computed in software.
The proposed MNLF algorithm can achieve 100% utilization of the outgoing transmission link, and guarantees that the maximum normalized service lag is near minimal. Therefore, it guarantees that the sizes of queues for the traffic flows will be nearminimal. In many systems, the amount of memory required to implement queues can be reduced.
New traffic flows can be added or removed incrementally, without substantially affecting the other traffic flows in the server. The traffic rates of existing flows can be changed, without substantially affecting the other traffic flows in the server.
The proposed MNLF server does not require any ‘speedup’ in order to achieve 100% throughput of the outgoing transmission line. The method works with a speedup of one and achieves up to 100% throughput, while minimizing the maximum normalized service lag and the associated queue sizes.
The proposed iterative MNLF method is relatively fast, with a worstcase run time of O(F log N) when computing a schedule for N flows over a scheduling frame of duration F timeslots and when executed in a serial processor.
To compute schedules very quickly, a recursive and parallel GPS scheduler is proposed. The parallel version has considerably faster runtimes compared to the serial version when executed in a multipleprocessor implementation, such as the new multicore Intel processors. Assuming P processors are available for computation, the runtime is approx, O((F log N)/P).
To compute lower jitter schedules very quickly, a recursive and parallel MNLF scheduler is proposed. The parallel version has considerably faster runtimes compared to the serial version when executed in a multipleprocessor implementation, such as the new multicore Intel processors. Assuming P processors are available for computation, the runtime is approx. O((F log N)/P).
In a real Internet router, the proposed MNLF scheduler can be used to schedule multiple traffic flows which share a resource, such a transmission line. In many Internet routers, a hierarchy of 2 levels of schedulers is useful. A method to use a hierarchy of MNLF schedulers is proposed. The hierarchy of servers can provide guaranteed traffic rates along with maximum service lag bounds to any number of traffic flows. It also allows for the control of how much link bandwidth is allocated to each set of traffic flows competing for one output link.
SUMMARY OF THE INVENTIONIn accordance with embodiments of the present invention, a method to schedule multiple traffic flows through a multiplexer server to provide fairness guarantees, while simultaneously minimizing the sizes of the associated queues, is proposed. To minimize the sizes of the associated queues, the multiplexer server may minimize a quantity called the maximum Normalized Service Lag for each traffic flow. Every traffic flow to be scheduled through a multiplexer server is assigned two values, an initial Normalized Service Lag value, and a Normalized Lag Increment value. In each timeslot, the normalized service lag of every traffic flow is updated by adding the normalized lag increment value. In each timeslot, a multiplexer server selects a traffic flow to service with an available packet and with the maximum normalized service lag. Efficient software and hardware methods for performing the iterative calculations are presented. When the traffic rate requested by each traffic flow is stable, the multiplexer server schedule will repeat periodically. Efficient methods to compute periodic schedules are proposed. The periodic schedules can be stored and reused. The methods can be applied to multiple traffic classes, such as GuaranteedRate traffic flows and BestEffort traffic flows. The methods can be applied to packetswitched Internet routers to achieve nearminimal queue sizes and nearminimal delays.
In accordance with another aspect of the present invention, there is provided a method to schedule N traffic flows through a multiplexer server system. The multiplexer server system comprises a queue for each of the N traffic flows, a multiplexer server, and an outgoing link, wherein each of the N traffic flows has an associated weight equaling the fraction of the outgoing link capacity requested by the flow. The method comprises (a) assigning each of the N traffic flows an initial normalized lag value, (b) processing each of the N traffic flows and assigning each of the N traffic flows a normalized lag increment value, equaling an ideal interdeparture time for average sizes packets associated with that traffic flow divided by the timeslot duration, (c) in each increment of the timeslot dock, processing the N traffic flows and adding the normalized lag increment value to the normalized lag value associated with each of the N traffic flows, (d) in each increment of the timeslot clock during which the outgoing link is idle, processing the N traffic flows and selecting one packet associated with one of the N traffic flows for transmission over the outgoing link, the one of the N traffic flows having the largest normalized lag value which exceeds a given threshold value, (e) removing one packet from the queue associated with the one of the N traffic flows, transmitting the packet over the outgoing transmission line for K timeslots, and decrementing the normalized lag value associated with the one of the N traffic flows by K times the normalized lag increment value.
In accordance with another aspect of the present invention, there is provided a method to schedule traffic flows through an input port associated with a switching matrix. The input port comprises multiple Virtual Output Queues (VOQs), one server, and one outgoing link associated with a switching matrix, wherein each of the VOQs stores packets associated with a subset of the N traffic flows, and wherein packets within one VOQ request a common output port of the switching matrix. The method comprises steps of (a) assigning each of said N VOQs a weight equaling the fraction of the capacity of the outgoing link requested by said VOQ, (b) wherein the server selects the VOQs for transmission onto the outgoing link such that traffic associated with each of the N VOQ is transmitted over the outgoing link with a bounded normalized service lead/lag.
In accordance with yet another aspect of the present invention, there is provided a method to schedule multiple GuaranteedRate (GR) traffic flows through an input port associated with a switching matrix. The input port comprises N Virtual Output Queues (VOQs), one VOQserver, and one outgoing link associated with a switching matrix, the outgoing link called a port link, each of the VOQs comprising multiple flowVOQs, one gated flowserver and one outgoing link connected indirectly or directly to the VOQserver, each of the outgoing links called a VOQlink, each of the flowVOQs storing packets associated with one of the GR traffic flows, (a) wherein each VOQ is assigned a weight equaling the fraction of the capacity of the outgoing port link requested by the VOQ, (b) wherein the VOQserver selects VOQs for service in proportion to the weight of the VOQ, (c) wherein each gated flowserver associated with each VOQ receives control signals called enable signals from the VOQserver, and selects one GR traffic flow for transmission onto the outgoing VOQlink in response to an enable signal, such that each of said GR traffic flows is transmitted over the outgoing port link with a bounded normalized service lead/lag.
In accordance with another aspect of the present invention, there is provided a method to schedule N traffic flows through a multiplexer server system. The method comprises multiplexer server system comprising a queue for each of the traffic flows, a gated multiplexer server responsive to an enable signal, and an outgoing link, wherein each of the traffic flows has an associated weight equaling the fraction of the outgoing link capacity the requested by the flow. The method comprises steps of (a) assigning each traffic flow an initial normalized lag value, (b) processing each traffic flow and assigning each traffic flow a normalized lag increment value, the normalized lag increment value equaling the ideal interdeparture time for average sizes packets associated with the flow divided by the timeslot duration, (c) in each increment of the timeslot clock, processing the N traffic flows and adding the normalized lag increment value to the normalized lag value associated with each traffic flow, (d) in each increment of the timeslot clock during which the outgoing link is idle and the enable signal is asserted, processing the N traffic flows and selecting one packet associated with one traffic flow for transmission over the outgoing link, the one traffic flow having the largest normalized lag value, (e) removing one packet from the queue associated with the one traffic flow, transmitting the packet over the outgoing transmission line for K timeslots, and decrementing the normalized lag value associated with the one traffic flow by K times the normalized lag increment value.
All packets have a fixed maximum size, and each packet can be transmitted over the outgoing link in a fixed number of timeslots.
In accordance with an embodiment of the present invention, there is provided a method to schedule N traffic flows through a multiplexer server system. The multiplexer server system comprises a queue for each of the traffic flows, a gated multiplexer server responsive to an enable signal, an outgoing link, and a virtual time clock, where each of the traffic flows has an associated weight equaling the fraction of the outgoing link capacity the requested by the flow. The method comprises steps of (a) assigning each traffic flow an initial virtual finishing time value, (b) each time a packet associated with one traffic flow arrives at an empty queue, assigning the packet an associated virtual finishing time equaling the current virtual time plus a value equaling the length of the packet in bits divided by the weight of the traffic flow, (c) each time a new packet with index j associated with one traffic flow arrives at a nonempty queue, assigning the new packet with index j an associated virtual finishing time equaling the virtual time of the packet with index (j−1) in said queue in plus a value equaling the length of the new packet in bits divided by the weight of the traffic flow, (d) in each increment of the timeslot clock during which the outgoing link is idle and the enable signal is asserted, processing the N traffic flows and selecting one packet associated with one traffic flow for transmission over said outgoing link, said one traffic flow having the largest virtual finishing time, (e) removing said one packet from the queue associated with said one traffic flow, and transmitting said one packet over the outgoing transmission line for K timeslots.
In accordance with yet another embodiment of the present invention, there is provided a method to schedule N GuaranteedRate (GR) traffic flows through N paths in a network. The network comprises switches and links, each of the paths comprising a sequence of switches and one outgoing link associated with each of the switches, wherein multiple GR traffic flows may be scheduled through a common switch when their paths intersect at the common switch, wherein each of the GR traffic flows has an associated weight equaling the fraction of the outgoing link capacity requested by the GR traffic flow at each of the switches in the paths, wherein (a) the first switch in each of the paths schedules the associated GR traffic flow for transmission on the associated outgoing links with a bounded normalized service lead/lag, (b) the subset of N GR traffic flows arriving at any of the switches in any of the N paths will each have a bounded normalized service lead/lag, (c) each of the switches in each of the N paths will schedule the GR traffic flows arriving at the switch onto the associated outgoing links such that each GR traffic flow departing on an outgoing link will have a bounded normalized service lead/lag.
At least one of the switches in the network schedules at least one of the GR traffic flows.
In the figures, which illustrate embodiments of the invention by way of example only,
Consider a system of N traffic flows arriving at one multiplexer server 12, as shown in
We assume the server 12 in
As shown in the GSPpapers by Parekh and Gallager, the performance of the GPS server is succinctly described by 2 equations which are iteratively solved. Let VET denote the ‘Virtual Finishing Time’ of an HOL packet. (Hereafter, we will use the term ‘packet’ to denote a variablesize packet or a fixedsized cell.). There is a global virtualtime clock R, which records current virtual time measured in ‘rounds’ of service completed. When the ideal GPS server has visited all queues in one round of service, the virtual time R is incremented by 1.
Assume the packets associated with a traffic flow f are labeled with integers k, for k=1 . . . infinity. Let P(k,f) denote packet k of flow f, and let L(k,f) denote the length of the packet in bits. Let w(f) denote the number of bits served for this flow per round of service. When packet P(k,f) with L(k,f) bits arrives at an empty queue for traffic flow f, the packet assigned a Virtual Finishing Time VFT(k,f) as follows:
VFT(k,f)=R+L(k,f)/w(f) (1)
The VFT(k,f) of the packet equals the current virtual time R plus as many rounds of service as needed to transfer the L(k,f) bits in the packet to the output side of the server.
Every traffic flow f is assigned a VFT(f) equal to the VFT(k,f) of its HeadofLine (HOL) packet, if an HOL packet exists. Otherwise, the traffic flow is assigned a VFT=infinity, i.e. a traffic flow f with an empty queue has a VFT(f)=infinity. When the Virtual Time R reaches the VFT(f) for flow f, the HOL packet P(k,f) in the queue for traffic flow f will be completely transferred from the queue to the output transmission link by the ideal GPS server. The packet will be completely serviced and can be removed from the queue. The departure time for a packet from flow f can be added to the ideal departure schedule computed by the GPS server.
When a packet P(k,f) with L(k) bits arrives at an nonempty queue f, it is assigned a VFT(k,f) as follows:
VFT(k,f)=VFT(k−1,f)+L(k,f)/w(f) (2)
The VFT for this packet equals the VFT of the packet ahead of it in queue f, plus as many rounds of service as needed to transfer the L(k,f) bits in the packet to the output side of the multiplexer switch.
The GPS Service Guarantees (Prior Art)According to the 1^{st }paper by Parekh and Gallager (page 345), let Sj(t1,t2) denote the amount of traffic which has been served in the interval of time (t1,t2) for flow j.
Property (1): Every flow f is guaranteed to receive its ‘fair share’ of the output link capacity C. Its fair share is given by: w(f)/(sum of w(k) for all flows k=1 . . . N)
Property (2): If there is any ‘excess bandwidth’ available on the output link, this excess bandwidth is shared fairly amongst all traffic flows with queued packets. Therefore, the following inequality holds for all flows:
The above GPS server description is an idealization, since it is impractical to split a packet into individual bits for service. In a real server, packets or cells are serviced as nondivisible entities. The departure order of packets in the ideal GPS server schedule can be used to determine the departure order of indivisible packets in a real server. We will call the real multiplexer server the WFQ server. The variablesize packets are transmitted from the queues to the outgoing transmission line on the output side of the WFQ server in the same order as the idealized order computed by the GPS server. Let VFT(f) denote the VFT for each flow f. A flow with an empty queue has VFT(f)=infinity. In each round of service R, the traffic flow to service in the WFQ server is selected, as the following line demonstrates. We assume the syntax of the Matlab programming language, developed by the Mathworks, described at http:www.mathworks.com.
(VFTmin,fmin)=min(VFT(1:f)) (4)
The min( ) function in equation 4 processes all flow VFTs and finds the flow with the minimum VFT. The variable VFTmin returned by Eq. 4 equals the smallest VFT of all flows with queued packets. If there is at least one nonempty queue, the variable VFTmin equals the smallest VET, and fmin is the index of the flow with this VFT. If VFTmin=infinity, then all queues are empty and no flow is serviced in this round.
A complication occurs when multiple flows share the same virtual finishing time. In this case, equation 4 has several flows to select from. The GPS papers by Parekh and Gallager did not explicitly state how to resolve this issue, as any choice will satisfy the two earlier guaranteed fairness properties. In any round of service, given a set of flows which have the same minimum VET, we can assume (a) the random selection of one flow from the set, or (b) the selection of the flow f with the minimum index f, thereby enforcing a roundrobin order on the flows in the set. We will assume a roundrobin order.
The real WFQ server serves packets in the order dictated by the last equation (4), the same order as the idealized GPS server. In each round of service, one nonempty flow with the smallest VFT is selected for service. Its entire HOL packet is moved by the WFQ server to the output link, as a nondivisible entity. Once the packet k is serviced, the VFT for the flow is updated to the VFT of the new HOL packet k+1 in the queue.
Application of GPS to Scheduling GuaranteedRate Traffic Flows with FixedSized Cells (Prior Art)
If all traffic flows have the GR property and all packets have a fixed size, then equations (1) and (2) can be simplified. This section describes the adjustments to equations (1) and (2). Every fixed sized packet is called a cell, and a cell may contain for example 64, 256 or 1K bytes of data. When packets have a fixed size, each round of service in the WFQ server equals one timeslot of duration.
When a cell k of traffic flow f arrives at an empty queue, it is assigned a VFT(k,f) given by Eq. (5), where R is the current timeslot.
VFT(k,f)=R+IIDT(f) (5)
The variable VFT(k,f) equals the current timeslot R plus a quantity called the IIDT(f) for traffic flow f. The IIDT(f) for a traffic flow f equals the ‘Ideal Interdeparture Time’ between successive cells in the traffic flow. The IIDT(f) of a traffic flow f equals 1/w(f), when w(f) is expressed as a fraction between 0 and 1. For example, a traffic flow with a weight w(f)=0.5 uses 50% of the capacity of the outgoing transmission link. Ideally, one cell departs every 2 timeslots and the IIDT(f)=2 timeslots for this flow. As another example, a traffic flow with a weight w(f)=⅓ uses 33% of the capacity of the outgoing transmission link. Ideally, one cell departs every 3 timeslots and the IIDT(f)=3 timeslots for this traffic flow.
When a cell k of traffic flow f arrives at a nonempty queue j, it is assigned a VFT shown in Eq. (6).
VFT(k,f)=VFT(k−1,f)+IIDT(f) (6)
The variable VFT(k,f) for the arriving cell equals the virtual finishing time of the cell ahead of it in the queue, plus the IIDT(f) for this traffic flow.
In the real WFQ server, the fixedsize cells are transmitted from the queues to the output transmission link in the same order as computed by the ideal GPS schedule. In each timeslot t, the WFQ server identifies a cell to potentially service as follows.
(VFTmin,fmin)=min(VFT(1:f)) (7)
The variable VFTmin equals the smallest VFT of all flows which have queued cells. If there is at least one nonempty queue, the VFTmin equals the smallest VFT, and fmin is the index of the flow with this VFT.
The WFQ server services traffic flows in the order dictated by Equation 7, thereby using the same departure schedule as the ideal GPS server. In each timeslot t, one nonempty flow with the smallest VFT is selected for service. Once the packet k is serviced, the VFT(f) for the flow f is assigned to the VFT of the new HOL packet k+1 in the queue for flow f. In the special case where we considering, the VFT of the next packet in the queue is given by the following equation:
VFT(f)=VFT(f)+IIDT(f) (8)
The ideal GPS server algorithm allows excess bandwidth to be fairly shared. Bandwidth sharing is useful when the queues are ‘backlogged’. If the outgoing link capacity C is not fully reserved by the traffic flows, the excess bandwidth can be used by backlogged traffic flows. It is possible to disable this excess bandwidth sharing. In fact, to minimize the maximum normalized service lag we disable this bandwidth sharing. In the real WFQ server, the following equation can be used to select the flow for service in each timeslot t, such that each flow receives it guaranteed fraction w(f) of the link capacity C over any sufficiently long interval of time and no more that its guaranteed fraction w(f):
If the traffic flow f=fmin is serviced, its VFT(f) is updated by adding the IIDT(f) for the flow, as shown hi Equation 8. Otherwise, the timeslot remains idle and the VFT(f) of flow f=fmin remains unchanged. The Service LeadLag
The following results were established in the paper by T. H. Szymanski, “Bounds on the EndtoEnd Delay and Jitter in InputBuffered and Internally Buffered IP Networks”, presented at the IEEE Sarnoff Symposium held at Princeton University, New Jersey, in March/April 2009, which is hereby incorporated by reference.
Consider a real WFQ server with N traffic flows, where each flow requests a fraction w(f) of the outgoing link capacity C, where the sum of w(f) for f=1 . . . N equals 1, i.e., the outgoing transmission link is fully loaded. Assume the packet size is fixed and that time starts at timeslot=0.
Definition: The ‘service time’ of cell k of flow f is defined as the timeslot the cell is served by the server and is denoted S(k,f).
Definition: The ‘InterDeparture time’ of cell k of flow f, denoted IDT(k,f), is defined as the number of timeslots between the service of cells k and k−1, i.e., IDT(k,f)=S(k,f)−S(k−1,f), for cells k>=2.
Definition: The ‘Ideal InterDeparture lime’ of cells in flow f, denoted IIDT(f), is defined as the ideal number of timeslots between the service of cells k and k−1 in an ideal schedule, i.e., IIDT(f)=1/w(f). If a flow f requests 100% of the output link capacity, its IIDT(f)=1 timeslot. If a flow f requests w(f)=33% of the output link capacity, its IIDT(f)=1/0.33=3 timeslots.
Definition: The ‘Real Received Service’ of flow f at time t, denoted R(f,t) is equal to the integervalued number of cells which have been served by the WFQ server in the interval of timeslots 1 . . . t.
Definition: The ‘Jitter’ between cells k and k−1 of flow f, denoted J(k,f), is equal to the deviation between the interdeparture time of cell k and the IIDT(f) for the traffic flow f, i.e., J(k,f)=IDT(k,f)−IIDT(f), for cells k>=2. Define the ‘average jitter’ of a traffic flow f to be the average of the cell jitters J(k,f) for all cells k=2 . . . infinity. Similarly, define the ‘minimum jitter’ or maximum jitter′ of a traffic flow f to be the minimum or maximum of the cell jitters J(k,f) for all cells k=2 . . . infinity, respectively.
Definition: The ‘Ideal Received Service’ of flow f at time t, denoted IRS(f,t), is equal to the realvalued number of cells which have been served by an server (which never experiences any contention for the output link) in the interval of timeslots 1 . . . t.
Definition. The ‘ServiceLeadLag’ of a cell k of flow f at time t, denoted LAG(k,f,t), is the difference between the real service time of cell k of flow f relative to the ideal service time of cell k in flow f that an ideal server would provide. Assuming that all flows are assumed to start service at time t=0, the serviceleadlag is given by:
LAG(k,f,t)=S(k,f)−k*IIDT(f) (10)
Equation 10 expresses the service lag in timeslots. Observe that a positive LAG(f,t)>0 denotes how many timeslots behind service cell k of flow f has fallen at timeslot t. We call a positive LAG(k,f,t) a ‘service lag’. Observe that a negative LAG(k,f,t)<0 denotes how many timeslots ahead of service cell k of flow f has become at timeslot t. We call a negative LAG(k,f,t) a ‘service lead’.
An important performance metric of any scheduling algorithm is the difference between the largest positive lag and the smallest negative lead. This metric affects the size of any queue and it is desirable to minimize this value. Observe that the LAG definition can be adjusted to select any starting time for a flow. The net affect of defining a starting time for a flow is to adjust the actual LAG value, but the difference between the largest positive lag and the smallest negative lead does not change when we select a new starting time.
Definition. The ‘Normalized ServiceLeadLag’ of a cell k of flow f at time t, denoted nLAG(k,f,t), is the lag value for cell k of flow f at time t divided by the IIDT(f) for the flow f.
Assuming that all flows are assumed to start service at time t=0 the normalized serviceleadlag is given by:
nLAG(k,f,t)=(S(k,f)−k*IIDT(f))/IIDT(f) (11)
Observe that a positive nLAG(k,f,t)>0 denotes how many cells (or fixedsize packets) behind service cell k of flow f has fallen at timeslot t. We call a positive nLAG(k,f,t) a ‘service lag’. Observe that a negative nLAG(k,f,t)<0 denotes how many cells ahead of service the flow has become at timeslot t. We call a negative SLL(f,t) a ‘service lead’. An important performance metric is the difference or spread between the largest positive normalized lag and the smallest negative normalized lead. This metric affects the size of any queue and it is desirable to minimize this value.
While the above definitions apply to GuaranteedRate (GR) traffic flows with fixedsized cells, they are easily extended to nonGR traffic flows or traffic flows with variablesize packets. To handle variablesize packets, we can define a fixedsized cell to equal one byte, and use the previous definitions to define a normalized service lag expressed as the number of bytes behind service a flow has fallen, relative to an ideal schedule.
The service lead or service lag are related to the jitter, but they are not equal as the following example shows.
A Theory on QueueSizesThe following theorem was established in the paper by T. H. Szymanski, “Bounds on the EndtoEnd Delay and Jitter in InputBuffered and Internally Buffered IP Networks”, which was earlier incorporated by reference.
Theorem 1: Given a traffic flow f traversing a queue Q over an interval of time where the arriving traffic stream has a maximum normalized service lead/lag of <=K cells, and where the server has a maximum normalized service lead/lag of <=K cells, then the queue will contain at most 4K cells.
Theorem 1 states that any queue which meets the next 2 conditions will have a finite and bounded number of queued cells over all time t: (a) the queue is fed by a traffic stream with a maximum Normalized Service LeadLag (NSLL) of K cells, and (b) the queue is served by a server with a maximum NSLL of K cells.
The importance of this theorem is that all queues that meet these conditions do not need to have infinite capacity. Referring to the GPS server system in
This example will illustrate that a bounded jitter between cells in a traffic flow does not imply the same bounded normalized service lead/lag for the traffic.
Referring to
For the second flow, let the service time of cells j=1 . . . 5 equal (j−1)*10+1, i.e., cells 15 receive service at timeslots 10, 20, 30, 40, 50, reflecting the fact that the jitter is bounded by 5 timeslots. Let the following 5 cells receive service with perfect IDT, i.e., cells 610 receive service at timeslots 55, 60, 65, 70, and 75, with zero jitter. Thereafter, the next 10 cells (cells 1120) receive service with a spacing of 2.5 timeslots or (½) an IIDT. In both flows, 20 cells are served in 100 timeslots, i.e., two the flows have the same longterm average service rate. However, at timeslot 50 the first flow would have received service for exactly 10 cells, while the second flow has only received service for 5 cells. In this example, the second flow is 5 cells behind the ideal service schedule at timeslot 50. These 5 cells are stored in the queue, and the queue size is therefore at least 5 cells. The important point in this example is that a bounded jitter does not imply the same bounded service leadlag. Therefore, a bounded jitter does not imply that the queues in the GPS or WFQ servers in
One difficulty of the regular GPS algorithm is that its departure schedule depends upon the initial values of the VFTs assigned to the traffic flows. Consider the specific problem of scheduling N traffic flows through the server, where each flow has a guaranteed traffic rate to be met. The scheduling frame has duration of F timeslots, and every flow f has a requested number of transmission opportunities, denoted rate(f), in the scheduling frame. To be admissible, the sum of all the requested rates by all traffic flows must be <=F. To compute a departure schedule for these N flows in a scheduling frame of duration F timeslots, one needs to assign the initial values of the VFTs for each flow. According to the regular GPS algorithm, if every flow has an empty queue initially, its initially VFT value is assigned when its first cell arrives. However, when computing a schedule for GR traffic in scheduling frame of duration F, the schedule should be independent of the arriving times of all the cells. Therefore, we can assume each flow has k cells in its queue initially, where k=rate(f), and we may assume that the initial VFT of every flow is 0.
The problem with this variation of the regular GPS algorithm is that the schedules are not valid until after the system has stabilized. In particular, over the first F timeslots the number of timeslots assigned to each flow may not equal the requested rate for that flow. One solution is to compute the schedule over larger intervals of time, i.e., to compute the schedule over multiple scheduling frames, i.e., over timeslots from 1 to J*F for some integer J. We may discard the first (J−1)*F entries in the schedule, and we keep the last F entries in a periodic schedule of length F. This schedule will usually be valid, i.e., every flow will be assigned a number of timeslots equal to its requested rate. However, the problem with this approach is its speed. In some applications F may be as large as 1024 timeslots or 4,096 timeslots. Therefore, one may compute the GPS iterative solutions repeatedly until the schedule stabilizes, which is undesirable.
A second problem is that the regular GPS schedule does not minimize the jitter. The GPS schedule does provide lowjitter, but it does not minimize the jitter. Furthermore, the schedule will be repeatedly periodically if the traffic flow rates remain constant. Therefore, the system will have larger queues than necessary.
We now present a variation of the GPS algorithm which can compute a departure schedule with lower jitter, which is guaranteed to be valid over timeslots 1 . . . F. One only needs to compute the GPS iterative equations for F timeslots, from 1 to F, to compute a stable schedule.
Line 200 calls the method with parameters F,N, the rate vector and the initial VET values assigned to the flows, N is the number of flows and F is the length of the scheduling frame. Line 202 assigns the vector of flow IIDT's for be infinity for every flow. Line 204 assigns the initial vector of flow VFTs for be infinity for every flow. Lines 206216 form a bop which initializes the MDT and the VFT for every flow f. Line 208 tests is the flow rate is >0 timeslots. If true, the MDT for the flow is assigned in line 210, and the initial VFT value for the flow is assigned in line 212. If a flow has no requested rate, its MDT and its initial VFT remain at infinity.
Line 218 identifies the next cell number to be scheduled for every flow. Initially, the next cell to schedule for every flow is the first cell with cell number 1. Lines 220234 form a loop which performs the proposed modified scheduling calculations for F timeslots, from timeslot ‘ts’ varying from 1 up to F. Line 222 identifies the flow with the smallest VFT (which equals=minVFT); this flow has index ‘fmin’. Line 224 tests to see if the minVFT is less or equal to the current timeslot, and if the next cell number to schedule for this flow is less than the requested rate for this flow. If true, in line 226 the traffic flow fmin is scheduled for service in this timeslot. If true, in line 228 the next cell number is incremented for this flow. If true, the VFT for this flow is updated in line 230. After the loop in lines 218234 completes the iterations for F timeslots, the final schedule is returned in line 236.
This method will assign all N traffic flows to have their first virtual finishing times in the range 0 up to N−1. Every flow will be assigned a unique first VFT. Flows with the higher rates will have initial VFTs which are lower than flows with lower rates, i.e., flows with higher rates precede flows with lower rates in this linear ordering. The results of this algorithm are presented in
Consider an ideal GPS server 12 as shown in
When mapping the GPS server to the real WFQ server, selecting the flow with the minimum VFT as the next flow to service makes intuitive sense, as the selected flow would be the next to receive service by the GPS server if such an ideal system could be built. The real WFQ server services packets in the same order as the GPS server order. However, in the real WFQ server the act of transferring an indivisible cell or packet from the queue to the output link takes a finite amount of time which is proportional to the number of bits in the packet. During this time, no other packets can use the WFQ server or the output link. Effectively, the real WFQ server resolves the contention for the output link by serving packets in the same order as the GPS algorithm, but all eligible and unselected HOL packets wait at their queues for their opportunity for service. Therefore, in a heavily loaded WFQ server a backlog of eligible and unselected traffic flows which satisfy equation (9) will accumulate; these traffic flows all have VFTs less than or equal to the current timeslot and have not been served by the real WFQ server at the current timeslot. All of these traffic flows are eligible for service in any particular timeslot while their VFT is <=current timeslot, and the WFQ server selects the traffic flow with the minimum VFT for service.
A key observation is the following: the HOL packet with the minimum VFT does not have the maximum normalized service lag. Therefore, the existing GPSWFQ algorithms do not minimize the normalized service leadlag of the traffic flow selected for service. As a consequence of theorem 1 stated earlier, any the queue associated with the GPSWFQ server will be larger than necessary.
A realworld example is the following. In an airport, consider a queue of passengers waiting to check their baggage and catch their flights. As the deadline for a flight approaches, passengers on that flight can usually preempt other passengers. The value of each minute of ‘waiting time’ for each passenger is different, depending upon how urgent the passenger's deadline is. The value of each minute spent waiting is much more important for a passenger with a deadline of 5 minutes, than for a passenger with a deadline of 1 hour. The GPS algorithm treats time equally for all backlogged traffic flows, by servicing backlogged flows according to the lowest VFT first. A better strategy is to treat traffic flows according to the lowest normalized service leadlag. This strategy is often used at airports, where passengers who are about to miss their flight are usually given priority.
Consider a first example where a set of K flows have identical VFTs all satisfying equation (9) at timeslot t. Assume the GPS server selects one flow fmin for service at random from this set. By following the GPS service order, the real WFQ server will select the same flow fmin for service. However, given that several flows have the same minimum VFT, a better strategy is to select the flow f* with the maximum normalized service lag at that timeslot. This decision guarantees that the real WFQ multiplexer scheduler makes the best decision possible at the given timeslot, with respect to minimizing the normalized service lag. This decision will minimize the sizes of the associated queues.
The above case considered one example. The strategy of selecting a flow to service can be generalized as follows. Call this first scheduling algorithm the MNLF algorithm.
Every flow f with a nonzero guaranteed traffic rate is initially assigned a normalized service lag of −1. Every flow f is also assigned a Normalized Service Lag increment value, equal to 1/IIDT(f). A flow which does not request any guaranteedrate traffic is assigned an initial normalized service lag of negative infinity, and an IIDT(f)=negative infinity. In each timeslot, the normalized service lag of every flow f is incremented by 1/IIDT(f). We observe that the normalized service lag increment for a flow depends upon the IIDT for the flow. Using this method, the value of each timeslot of waiting time for a flow is weighed, according to 1/IIDT(f). The value of each timeslot spent waiting is larger for traffic flows with higher traffic rates and smaller VDTs.
In each timeslot t, the normalized service lag for every flow f is incremented by its IIDT(f), as follows:
nLAG(f)−nLAG(f)+1/IIDT(f) (12)
In each timeslot t, a packet is identified for service as follows:
(nLAGmax,fmax)=max(nLAG(1:f)) (13)
The nLAGmax is the largest current normalized service LAG of all flows which have queued packets. If there is at least one nonempty queue, the nLAGmax equals the largest normalized LAG, and fmax is the index of the flow with this nLAG value. If multiple flows have the same maximum nLAG value, the server may select one flow from that set either at random, or the flow with the minimum VET, or any other criterion may be used.
The decision of which cell to service in a timeslot can be made as follows. The constant ‘THRESHOLD’ the smallest acceptable normalized service lag we are willing to accept, which can be for example −1 initially.
The MNLF server services flow f* and updates its nLAG value by subtracting 1, reflecting the departure of 1 cell.
The service lag of a cell k of flow f at time t is defined as:
lag(k,f)=t−c*IIDT(f) (15)
This equation 15 assumes that all flows start their time clocks at timeslot=0. The normalized service lag of a cell k of flow f at timeslot t is defined as:
lag(k,f)=(t−c*IIDT(f))/IIDT(f) (16)
To implement this MNLF system, each queue/flow may have a memory to record the next cell number to schedule. The next cell number is an index between 1 and infinity since the virtualtime goes from 1 to infinity as discussed earlier.
For GR traffic flows with fixed sizedcells, the final MNLF algorithm can be expressed as follows. The extension to variablesize packets is a straightforward modification of these equations.
When a packet k of flow f arrives at an empty queue, it is assigned a LAG value as follows, where R is the current timeslot:
LAG(k,f)=t−k*IIDT(f) (17)
Assuming all flows are ready to transmit at timeslot=0. The ideal departure time of cell k for this flow is equal to k*IIDT(f) timeslots. The Lag for cell k of flow f at timeslot t is therefore given by t−k*IIDT(f).
When a cell k on traffic flow f arrives at a nonempty queue j, it is assigned the same LAG value as follows:
LAG(k,t)=t−k*IIDT(f) (18)
In each timeslot, the lag values of all flows are incremented by 1 timeslot, since the time variable t is incremented by 1. The normalized LAG values are found by dividing the LAG of each flow f by its IIDT(f). i.e.,
nLAG(f)=LAG(f)/IIDT(f) (19)
In each timeslot, the server selects the flow with the largest normalized service lag. When a flow is selected for service, its lag values are updated.
The Matlab notation developed by MathWorks will be used in the flowcharts. A vector of length N can be denoted V(1:N). Element j of the vector can be denoted V(j). Two vectors A and B of length N can be operated upon, i.e., V=A+B is equivalent to V(1:N)=A(1:N) B(1:N). A complex loop will use indentation to identify the scope of the loop. A simple loop will use curly brackets { } to identify the scope of the loop.
Lines 292314 form the iterative calculations, for timeslots 1 up to F. For each timeslot, line 292 updates the normalized lag value for every flow j, by incrementing the normalized lag vector by 1/IIDT(j) for every flow j. Line 294 finds the flow with the largest normalized lag value. Line 294 assigns the variable maxnLAG to the maximum normalized lag value and variable fmax identifies the flow. Line 296 identifies the number of timeslots left in the scheduling frame, into the variable ‘free’. Line 298 tests if the number of timeslot requests equals the number of timeslots remaining in the frame. If true, the variable ‘forced’ is set to 1, otherwise it is set to 0.
Lines 304312 are executed if a flow is scheduled for service in the current timeslot. Lines 300 and 302 tests to see if either of 2 conditions are true. The first condition is true when the maximum normalized service lag is greater than some threshold value, and the next cell number to schedule for the flow is less than the requested rate for the flow. Typically, the threshold may be −1 or −0.5 or 0, and it indicates the most negative normalized service lead that we are willing to accept for any flow. The second condition is true if the variable ‘forced’=1, which occurs when the number of timeslots requests left to be satisfied equals the number of timeslots remaining. If either condition is true, line 304 schedules the flow for service in this timeslot, line 306 updates the next cell number to schedule for the flow, line 308 updates the normalized lag value for the flow, and line 310 decrements the number of requests left to be satisfied. Line 314 ends the iterative loop for the F timeslots. Line 316 returns the schedule, a vector of F elements where each element identifies a flow to be serviced in a timeslot. A 0 element indicates that no flow is to be scheduled for that timeslot.
Iterative Solution of EquationsIt is possible to iteratively solve the preceding equations as the timeslot variable t grows to infinity, for a system with fixedsized packets (cells). The timeslot t keeps incrementing to infinity, and the equations are solved in each timeslot iteratively.
Reuse of Server Schedules for GuaranteedRate Traffic FlowsIf the guaranteed rates of the traffic flows do not change from one scheduling frame to the next scheduling frame, and if fixedsized packets are used, the server schedules computed for each scheduling frame will become periodic. Therefore, it is possible to store the server schedule computed in one scheduling frame and reuse it in the next scheduling frame. To perform the storage, every server may have an associated schedule look up table. When the schedule is periodic, a controller may enable the use of the schedule in the lookup table, and disable the use of dynamic computation of the schedule. Otherwise, the controller will allow the schedule to be computed dynamically by solving the preceding equations. This option has several attractive aspects for GR traffic flows. It will minimize the power expended in the server scheduler, since the schedule can be computed once and reused as long as the traffic flow rates remains unchanged. Furthermore, the use of a precomputed server schedule allows for the possibility where a control processor can download an alternative precomputed server schedule.
Deterministic Initialization for GuaranteedRate SchedulingFor variablesize packets, the GPSWFQ equations are all conditional on the arrival times of new packets, i.e., the virtual finishing time assigned to a new packet depends upon the state of the queue and the number of bits in the new packet. When a new packet arrives to any empty queue, a new VFT is assigned to the packet based upon the current virtual time. R When the queues are continuously backlogged, all future events are affected by the initial value of the VFT of each packet arriving at an empty queue.
For GuaranteedRate traffic flows with fixedsized cells, the schedule is periodic. The cell service times in a scheduling frame are determined in a deterministic manner, and are not influenced by the actual arrival time of packets. Therefore, methods can be employed to select the VFT for the first packet of every flow f, in a manner to minimize the jitter or service lag.
There are several approaches to select an initial VFT to minimize the service lag. The method Assign_First_VFTs in
In the first example scheduling problem, there are N=4 traffic flows to be scheduled over a scheduling frame with F=16 timeslots. The vector of guaranteed traffic rates is (1,2,4,8). All 4 flows request a total of (1+2+4+8)=15 timeslots out of the scheduling frame of 16 timeslots.
Table 5.1 illustrates the results of the method Schedule_GPS, assuming an initial VFT=0 for every flow. The schedule is computed over 32 timeslots. The schedule is periodic and repeats after 16 timeslots, and the periodic schedule is:

 Schedule=[1 2 3 4, 4 3 4 4, 2 3 4 4 3 4 4 0]
Observe that the schedule has relatively poor jitter properties, i.e., the service to flow 4 occurs in clusters, rather than being evenly spaced. Furthermore, the jitter will never improve since the schedule is periodic. The jitter will remain relatively poor as long as the schedule remains unchanged.
 Schedule=[1 2 3 4, 4 3 4 4, 2 3 4 4 3 4 4 0]
Table 5.2 illustrates the results of the method Schedule_GPS, assuming the initial VFTs are computed using the method Assign_First_VFTs. The schedule is periodic and repeats after 16 timeslots, and the periodic schedule is:

 Schedule=[4 3 2 4, 1 4 3 4, 4 2 3 4, 4 3 4 0]
Observe that the schedule has better jitter properties, i.e., the service to flow 4 is relatively evenly spaced. The assignment of the first VFTs using the method Assign_First_VFTs( ) has improved the jitter performance of the schedule.
 Schedule=[4 3 2 4, 1 4 3 4, 4 2 3 4, 4 3 4 0]
Table 5.3 illustrates the results of the proposed method Schedule_MNLF. The schedule is periodic and repeats after 16 timeslots, and the periodic schedule is:

 Schedule=[4 3 4 2, 4 3 4 1, 4 3 4 2, 4 3 4 0]
Observe that the schedule has excellent jitter properties, i.e., the service to flow 4 is perfectly evenly spaced.
 Schedule=[4 3 4 2, 4 3 4 1, 4 3 4 2, 4 3 4 0]
Table 6 illustrates the performances of the same three methods for a second example, with N=4, F=16, and guaranteed traffic rates=[7, 3, 2, 1]. The 4 traffic flows request 13 timeslots out of 16, and there are 3 idle timeslots in this example.
Table 6.1 illustrates the results of the method Schedule_GPS, assuming an initial VFT=0 for every flow. The schedule is computed over 64 timeslots. The schedule is periodic and repeats after 32 timeslots, and the periodic schedule is:
Observe that the schedule has relatively poor jitter properties, i.e., the service to flow 4 occurs in clusters, rather than being evenly spaced. Also, the length of the schedule is 2*F timeslots, which is unnecessarily long to compute and to store (if the schedule is to be stored). Furthermore, the jitter never improves with time since the schedule is periodic.
Table 6.2 illustrates the results of the method Schedule_GPS, assuming the initial VFTs are computed using the method Assign_First_VFTs. The schedule is computed over 64 timeslots. The schedule is periodic and repeats after 16 timeslots, and the periodic schedule is:

 Schedule=[0 3 2 4, 1 4 3 4, 0 4 2 4, 3 4 0 4]
Observe that the schedule has better jitter properties, i.e., the service to flow 4 is relatively evenly spaced. The assignment of the first VFTs using the method Assign_First_VFTs( ) has improved the jitter performance of the schedule.
 Schedule=[0 3 2 4, 1 4 3 4, 0 4 2 4, 3 4 0 4]
Table 6.3 illustrates the results of the proposed method Schedule_MNLF. The schedule is periodic and repeats every 16 timeslots. The periodic schedule is:

 Schedule=[0 4 3 4, 2 4 0 1, 4 3 4 2, 4 3 4 0]
Observe that the schedule has excellent jitter properties, i.e. the service to flow 4 is nearlyperfectly spaced, and the length of the schedule is minimal, i.e., it has a length of F timeslots.
 Schedule=[0 4 3 4, 2 4 0 1, 4 3 4 2, 4 3 4 0]
The preceding equations are iteratively solved to compute a server schedule. In many applications with GR traffic flows, the schedules are periodic and it is desirable to compute schedules very quickly. A computationally efficient recursive method is now proposed.
This method breaks the current problem of scheduling a vector V of N guaranteed traffic rates to be met in a scheduling frame of length F timeslots, into two smaller scheduling subproblems, to schedule two vectors of length N/2 into two scheduling frames of length F/2. In line 400 this function accepts these input parameters: V is the vector of guaranteed rates for N traffic flows with length N. Element V(j) is the number of timeslot reservations for flow j in the scheduling frame of length Fc. Variable Fe is the number of timeslots of the current scheduling frame. RS is a vector of length N. Element RS(j) is the real received service per flow j before the current scheduling problem. iVFT is the vector of the initial VFTs to be assigned to the packets in the current scheduling problem. It has length N and is measured in timeslots. Variables Ts and Te are the starting and ending timeslots for the current scheduling problem. Line 402 defines some globally visible data, including the number of traffic flows N, the initial F value denoted Fi before any recursive decomposition, the smallest F value denoted Fs when the recursion should stop, the IIDT vector, and the initial_VFTs at time 0, before any scheduling has happened.
Line 406 tests to see if the number of timeslots in the current scheduling problem size Fc exceeds the value Fs. If so, the scheduling problem will be subdivided into 2 smaller subproblems. Line 408 assigns vector Va to the integer values of one half of vector V. The recursive scheduler is likely to have serviced at least this vector of timeslot requests in the first subproblem (if the initial VFTs are sufficiently small). Line 408 assigns vector Vb the same values as vector Va. The recursive scheduler is likely to serve at least this vector of timeslot requests in the 2^{nd }subproblem.
Line 412 assigns vector Vrem the values of 0s or 1s. Every flow f with an even number of requests (i.e., V(f) is even) is assigned a 0 in Vrem(f). Every flow f with an odd number of requests (i.e., V(f) is odd) is assigned a 1 in Vrem(f). Line 414 defines the start of the 2^{nd }scheduling subproblem. Line 416 calls the method partition_ones( ). It is accept the vector Vrem and will return 2 vectors of length N, called Pa and Pb. Each vector has elements 0s or 1s, such that Pa+Pb=Vrem. A one in Vrem(j) represents a timeslot request for flow(j). These timeslots requests may be assigned into the first or second subproblems. A one in Pa(j) indicates that one additional timeslot request for flow j will be assigned to the 1^{st }subproblem. A one in Pb(j) indicates that one additional timeslot request for flow j will be assigned to the 2 ns subproblem.
Line 418 assigns vector RSa the value of vector RS. RSa is the vector of received service per flow before the first scheduling subproblem. Line 420 assigns vector iVFTa to the value of vector iVFT received from the calling program. The first scheduling subproblem will use the initial VFTs provided in vector iVFTa. Line 421 assigns the start time and end time into variables Ts1 and Te1 for the 1^{st }subproblem with F/2 timeslots.
Line 422 calls the same method Recursive_Schedule ( ) to solve the first scheduling subproblem. It returns a vector ScheduleA of length Fe, corresponding to the flows serviced in the Fe timeslots.
Line 424 assigns vector RSb the value of vector RS plus RSa. RSb is the vector of received service per flow before the second scheduling subproblem. The recursive scheduler is guaranteed to have serviced these flows before the start of the 2^{nd }subproblem.
Line 426 assigns vector iVFTb to the appropriate initial VFT values for each flow to be serviced in the 2^{nd }subproblem. iVFTb(j) equals the initial VFT for flow j plus the product of the next cell number to schedule for flow j times the IIDT for flow j. Line 428 assigns the start time and end time for the 2^{nd }scheduling subproblem of length F/2 timeslots.
Line 430 calls the same method Recursive_Schedule( ) to solve the second scheduling subproblem. It returns a vector ScheduleB of length Fc corresponding to the flows serviced in the Fc timeslots.
Line 432 combines the schedules for the 1^{St }and 2^{nd }subproblems to yield one schedule of length Fc, which will be returned by this function. Line 434 is the ‘else’ clause, invoked when Fc equals Fs. When this occurs, the recursion stops. Line 436 calls the method Schedule_Interval( ) to schedule all the requests in the current subproblem of length Fs timeslots. Line 436 returns a schedule of length Fs, which is returned by this function.
Lines 506524 defines a loop which processes each flow f, and assigns any timeslot request for flow f in vector Vrem to either the 1^{st }or 2^{nd }scheduling subproblem. Line 508 identifies the next unprocessed flow f. Line 510 tests to see if the flow index f is valid (f>0), if Vrem(f) equals 1, if the rate for the flow is >0, if the VFT for the flow is less or equal to the start time of the 2^{nd }subproblem midtime, and if the 1^{st }subproblem can accommodate the timeslot request (free_a>0). If true, the timeslot request for this flow is assigned to the 1^{st }subproblem, by assigning a 1 to Pa(f). If true, the variable free_a is decremented by 1 in line 514, as the 1^{st }subproblem can now accommodate one fewer timeslot requests. Line 516 tests to see if the flow index f is valid (f>0), if Vrem(f) equals 1, if the rate for the flow is >0, if the VFT for the flow is greater or equal to the start time of the 2^{nd }subproblem midtime, and if the 2^{nd }subproblem can accommodate the timeslot request (free_b>0). If true, the timeslot request for this flow is assigned to the 2^{nd }subproblem, by assigning a 1 to Pb(f). If true, the variable free_b is decremented by 1 in line 520, as the 2^{nd }subproblem can now accommodate one fewer timeslot requests.
Method Partition_Ones_MNLFHG, 11b illustrates the method partition_ones_MNLF(Vrem) for the MNLF scheduler. Line 500 sorts all Normalized Service Lags in descending order. Line 502 computes the number of free timeslots in the 1^{st }subproblem. Line 504 computes the number of free timeslots in the 2nd subproblem.
Lines 506524 defines a loop which processes each flow f, and assigns any timeslot request for flow f in vector Vrem to either the 1^{st }or 2^{nd }scheduling subproblem. Line 508 identifies the next unprocessed flow f. Line 510 tests to see if the flow index f is valid (f>0), if Vrem(f) equals 1, if the rate for the flow is >0, if the normalized service lag for the flow is positive at the start time of the 2^{nd }subproblem midtime, and if the 1^{st }subproblem can accommodate the timeslot request (free_a>0). If true, the timeslot request for this flow is assigned to the 1^{st }subproblem, by assigning a 1 to Pa(f). If true, the variable free_a is decremented by 1 in line 514, as the 1^{st }subproblem can now accommodate one fewer timeslot requests. Line 516 tests to see if the flow index f is valid (f>0), if Vrem(f) equals 1, if the rate for the flow is >0, if the normalized service lag for the flow is negative at the start time of the 2^{nd }subproblem midtime, and if the 2^{nd }subproblem can accommodate the timeslot request (free_b>0). If true, the timeslot request for this flow is assigned to the 2^{nd }subproblem, by assigning a 1 to Pb(f). If true, the variable free_b is decremented by 1 in line 520, as the 2^{nd }subproblem can now accommodate one fewer timeslot requests.
Results of Recursive SchedulingThe recursive scheduling methods were thoroughly tested, and they agree completely with the nonrecursive iterative methods. The results for one sample scheduling problem are shown. The scheduling problem has N=4 flows, with rates [5, 7, 9, 11], in a scheduling frame of length F=32 timeslots. The recursive partitioning terminates when the subproblem size is F_small=8 timeslots.
The vector [5, 7, 9, 11] is partitioned into vectors [2, 3, 5, 6] and [3, 4, 4, 5]. These are recursively partitioned into vectors [1, 2, 2, 3], [1, 1, 3, 3], and [2, 2, 2, 2] and [1, 2, 2, 3], which are then scheduled. Here is the final schedule for the recursive GPS methods:

 4 3 2 4, 1 3 4 2
 3 4 1 2, 4 3 4 3
 2 1 4 3, 2 4 1 3
 4 2 3 4, 1 4 2 3
For comparison, here is the final schedule for the nonrecursive method Schedule_GPS (using method Assign_First_Ones):  4 3 2 4, 1 3 4 2
 3 4 1 2, 4 3 4 3
 2 1 4 3, 2 4 1 3
 4 2 3 4, 1 4 2 3
The results are identical,
The MNLF algorithm was tested for a problem with N=4, F=32, and the rate vector [2, 4, 8, 16]. The vector was partitioned into 2 vectors [1, 2, 4, 8] and [1, 2, 4, 8], which were recursively split into vectors [0, 1, 2, 4], [1, 1, 2, 4], and [0, 1, 2, 4], [1, 1, 2, 4]. Here is the final schedule for the recursive MNLF methods:

 4 3 4 2, 4 3 4 0
 4 3 4 1, 2 4 3 4
 4 3 4 2, 4 3 4 0
 4 3 4 1, 2 4 3 4
For comparison purposes, the results of the GPS algorithm, using the method Assign_First_VFTs are:  4 3 2 4, 1 4 3 4
 4 3 2 4, 4 3 4 0
 4 3 2 4, 1 4 3 4
 4 3 2 4, 4 3 4 0
For comparison purposes, the results for the GPS algorithm, where the initial VFTs are 0s, are:  1 2 3 4, 4 3 2 4
 1 3 4 2, 3 4 1 2
 3 4 4 3, 2 1 4 3
 2 4 3 1, 4 2 3 0
Input Queued crossbar switches are described in the paper by T. H. Szymanski, “Bounds on the EndtoEnd Delay and Jitter in InputBuffered and Internally Buffered IP Networks”, which was incorporated by reference earlier.
An N×N InputQueued (IQ) crossbar switch 600 is shown in
Each input port 602(a) has a VOQserver 612(a). In each timeslot, a VOQserver 612(a) may select one VOQ 610(a,*) for service, where the denotes any label from (a) to (n). If a VOQ 610(a,*) is selected for service, the associated VOQserver 612(a) will remove one packet from the VOQ 610(a,*) and transmit the packet onto the outgoing transmission line 616(a) to the switching matrix 606.
The switching matrix 606 has a programmable switch (not shown) at each of the Nsquared crosspoints 620. The programmable switch at crosspoint 620(a,b) can connect the row transmission line 616(a) with the column transmission line 622(b), thereby establishing a connection between input port 602(a) and output port 604(b). In practice, the switching matrix 606 can include other topologies to provide connectivity between input ports and output ports, rather than rows and columns.
A centralized control unit 618 is typically used to control the input ports 602, the output ports 604, the VOQservers 612 and the switching matrix 606. In each timeslot, the centralized control unit 618 matches a set of M input ports 602 to a set of M distinct output ports 604 for service, where M<=N. In an IQ switch 600, the input and output ports selected for service in one timeslot obeys two constraints: (1) each input port is connected to at most one output port by the switching matrix 606, and (2) each output port is connected to at most one input port by the switching matrix 606. For each input port 602 selected for service, the controller 618 controls the VOQserver 612 to select the appropriate VOQ 610 for service, where the appropriate VOQ contains the packets associated with the appropriate output port. The centralized control unit 618 controls the input ports, output ports, the VOQ servers and the switching matrix using control signals (not shown).
The long term traffic rates between the N input ports 602 and the N output ports 604 can be expressed in an N×N traffic rate matrix T, as shown in
A switch may support multiple traffic classes, such as GuaranteedRate (GR) traffic flows, and BestEffort (BE) traffic flows. The GR traffic flows request high Quality of Service (QoS) guarantees such as low endtoend delay and jitter, while BestEffort traffic flows request besteffort service with no QoS guarantees (or weak QoS guarantees). To support multiple traffic classes, the switch may maintain multiple traffic rate matrices, with each traffic rate matrix specifying the traffic capacity allocated between each pair of input and output ports, for each traffic class.
Scheduling an IQ switch to achieve 100% capacity is a difficult integer programming problem in combinatorial mathematics. One algorithm to schedule an IQ switch according to a traffic rate matrix T to achieve 100% capacity, without requiring any speedup of the switching Matrix 606, while also guaranteeing a low jitter and a small and bounded normalized service lead/lag for all the traffic flowing between any pair of input ports and output ports, is described in the 2007 U.S. patent application Ser. No. 11/802,937 by T. H. Szymanski, entitled “A Method and Apparatus to Schedule Packets Through a Crossbar Switch with Delay Guarantees”, which is incorporated by reference. This algorithm can be used by the controller 618. The controller 618 will process the matrix T and identify the sets of input ports 602 to be matched to output ports 604 in each timeslot, such that the Nsquared traffic rates specified in the matrix T are satisfied within F timeslots.
Scheduling Multiple Traffic Flows within One VOQ
In an Internet router using an IQ switch 600, there may be hundreds or thousands of traffic flows which share any one VOQ 610. When a VOQ 610 is selected for service, one of these hundreds or thousands of traffic flows are selected for service, representing a significant scheduling problem.
When a packet arrives at an input port 602(a), it is forwarded to the appropriate VOQ 610(a,*) by the demultiplexer 608(a). When a packet arrives at a VOQ 610(a,*), it is forwarded to the appropriate flowVOQ 632 by a demultiplexer 634, as shown in
In each time slot, the centralized controller 618 selects up to N input ports 602 for service, as described earlier. For each input port 602 selected for service in a timeslot, the controller 618 selects the appropriate VOQ 610 for service. The controller 618 identifies the VOQ 610 for service, but it does not select the traffic flow within the VOQ 610 for service. The method in which traffic flows are selected for service within one VOQ 610 will affect the queue sizes in each flowVOQ 632 and in each VOQ 610, and can have a significant affect on the endtoend network delay, jitter and performance.
Referring to
The flowserver 630 is an example of a ‘gatedserver’. A gatedserver is enabled for service by a control signal (not shown). Otherwise, the gatedserver remains idle. Gatedservers are described in a section of the textbook by D. Bertsekas and R. Gallager, “Data Networks”, 2nd edition, Prentice Hall, 1992, which is hereby incorporated by reference.
This section describes a method to select a flowVOQ 632 for service within a VOQ 610, when the VOQ 610 is selected for service. The method applies to any number of traffic flows which share one VOQ 610, for example 2 flows or 2 million flows can share one VOQ 610. The method also works for aggregated traffic flows. An aggregated traffic flow consists of the aggregation of any number of individual traffic flows, which share the same destination in the network.
Each gated flowserver 630 can use the method Schedule_GPS with appropriate modifications. The flowserver 630 controls access to an outgoing transmission line 634. The weight of each flowVOQ 632 expresses the capacity of the outgoing transmission line 634 controlled by the flowserver 630. Therefore, the weight of each flowVOQ 632 may be computed, expressed as a fraction of the capacity of the outgoing transmission line 634. The capacity of link 634 depends upon the bandwidth requested by the VOQ 610 in the traffic rate matrix. Therefore, the weights of the flowVOQs 632 may be recomputed every time the traffic rate matrix changes. Once these weights are computed, the method Schedule_GSP can be used, with one other modification. The processing loop in lines 220234 of the method Schedule_GSP is only processed for the timeslots ‘ts’ when the flowserver 630 is enabled for service. When the flowserver 630 is enabled for service, the timeslot counter ‘ts’ is incremented to the current timeslot between 1 and F. The flowserver 630 updates the VFT values for every trafficflow in each enabled timeslot, as described in the method Schedule_GPS. Therefore, in each timeslot a gated flowserver 630 is enabled, it selects one flowVOQ 632 for service. A gated flowserver 630 is enabled if it has been selected for service by the associated VOQserver 612. This change ensures that a gated flowserver 630 only allocates service that it has received from the VOQserver 612.
Each gated flowserver 630 can also use the method Schedule with appropriate modifications, to achieve a schedule with lower jitter. The flowserver 630 controls access to an outgoing transmission line 634. The weight of each flowVOQ 632 expresses the capacity of the outgoing transmission line 634 requested by the flowVOQ, as stated earlier. Therefore, the weight of each flowVOQ 632 may be computed, expressed as a fraction of the capacity of the outgoing transmission line 634, which is specified in the traffic rate matrix. Once these weights are computed, the method Schedule_MNLF can be used, with other modifications. The processing loop in lines 290314 of the method Schedule_MNLF is only processed for the timeslots ‘ts’ when the flowserver 630 is enabled for service. When the processing loop is activated, the timeslot counter ‘ts’ assumes the value of the current timeslot, between 1 and F. The flowserver 630 updates the nLAG values for every trafficflow in each timeslot in which it is enabled in line 292; This line is modified to reflect the fact that multiple timeslots may have expired since the last activation of the processing loop. Therefore, in each timeslot a gated flowserver 630 is enabled, it selects one flowVOQ 632 for service. A gated flowserver 630 is enabled if it has been selected for service by the associated VOQserver 612. This change ensures that a gated flowserver 630 only allocates service that it has received from the VOQserver 612.
This use of a 2level hierarchy of GPS or MNLF servers can be used to provide controlled access to the outgoing transmission link 616 by any number of traffic flows associated with each VOQ 610. The use of a 2level hierarchy of servers can also be used to provide service for multiple traffic flows in traffic classes, for example GuaranteedRate traffic flows and BestEffort traffic flows. The N×N Input Queued switch shown in
For GuaranteedRate traffic flows with fixedsized packets, the schedules computed for the VOQservers 612 will be periodic. Therefore, the schedules for the VOQservers 612 can be computed once when the traffic rate matrix T changes, and can be stored in an appropriate lookuptable. Each VOQserver 612 may have an associated lookuptable with F entries (not shown in any figure). The lookuptable identifies each VOQ 610 selected for service in each timeslot of a periodic scheduling frame. For a given timeslot, if the lookuptable entry is nonzero, then the VOQ 610 is identified for service. If the lookuptable entry equals 0, then the VOQserver remains idle for that timeslot.
For GuaranteedRate traffic flows with fixedsized packets, the schedules computed for the flowservers 630 will also be periodic. Therefore, the schedules for the flowservers 630 can be computed once when the traffic rate matrix T changes, and stored in an appropriate lookuptable (not shown in any figures). Each flowserver 630 may have an associated lookuptable with F entries. The lookuptable identifies each flowVOQ 632 selected for service in each timeslot of a periodic scheduling frame. For a given timeslot, if the lookuptable entry is nonzero, then the flowVOQ 632 is identified for service. If the lookuptable entry equals 0, then the flowserver remains idle for that timeslot.
Application to an Internally Buffered Crossbar SwitchInternally buffered crossbar switches are described in the paper by T. H. Szymanski, “Bounds on the EndtoEnd Delay and Jitter in InputBuffered and Internally Buffered IP Networks”, which was incorporated by reference earlier.
The long term traffic rates between the N input ports 602 and the N output ports 604 can be expressed in an N×N traffic rate matrix T, as shown in
The existence of crosspoint queues 620 in the switching matrix 600 simplifies the scheduling of traffic through the switch 606. Each VOQ 610(a,b) has an associated crosspoint queue 620(a,b). Therefore, each input port 602 can schedule its VOQserver 612 independently of the other input ports. In each timeslot at each input port 602, the VOQserver 612 may serve any nonempty VOQ 610 for service. For GuaranteedRate traffic, to achieve a moderate amount of jitter the VOQserver 612 can be scheduled using the method Schedule_GPS described earlier. For GuaranteedRate traffic, to achieve very low jitter the VOQserver 612 can be scheduled using the method Schedule_MLF described earlier. When a VOQserver 612(a) selects a VOQ 610(a,b) for service, it removes one packet from the VOQ 610(a,b) and forwards the packet over the transmission line 616 to the crosspoint queue 620(a,b) within the switching matrix 606. The input ports 602 for an internally buffered crossbar switch can also support multiple traffic classes, as described earlier.
In an internally buffered crossbar switch, the switching matrix 600 has an internal columnserver (not shown) associated with each column transmission line 622(a), . . . , 622(n). In each timeslot, the internal columnserver associated with a column transmission line 622(a) selects one nonempty crosspoint queue 620 in the column for service. When a crosspoint queue 620(a,b) is selected for service, the columnserver removes one packet from the crosspoint queue 620(a,b), and forwards the packet over the outgoing vertical transmission line 622(b) to the associated output port 604(b).
We have simulated the performance of the internally buffered crossbar switch using the GPSWFQ scheduling algorithms, assuming GR traffic flows with fixedsized cells. For a 64×64 buffered crossbar switch operating at 100% load, 100 fullysaturated traffic rate matrices of size 64×64 where generated.
All 100 matrices where processed and scheduled. The buffered crossbar switch was simulated to observe the maximum crosspoint queue sizes. Using the method Schedule_GPS, for a fullyloaded 64×64 crossbar switch the maximum crosspoint queue size is observed to be 6 cells, for our traffic matrices.
The method Schedule_MNLF can also be used to schedule each VOQserver 612 and each columnserver in column(b). Using the method Schedule_MNLF, for a fullyloaded 64×64 crossbar switch the maximum crosspoint queue size is observed to be 4 cells. Given that there are Nsquared=64*64=4K crosspoint queues 620 in the switching matrix 600, the reduction in size from 6 cells to 4 cells per crosspoint queue 620 is quite significant.
Dynamic Scheduling for the Column ServersThis section describes alternative scheduling algorithms for the columns of the internally buffered crossbar switches. Each column server in column(b) of the switching matrix 600 can use several different algorithms to select a crosspoint queue 620 in column(b) for service. For example, the column server may select the OldestCellFirst. In this case, in each timeslot the column server selects the crosspoint queue 620 with the oldest cell in the column. Our experiments indicate that this algorithm tends to result in smaller sizes of crosspoint queues 620.
Achieving NearMinimal Queue Sizes in a NetworkAccording to theorem 1 stated earlier, the size of any queue will remain small and bounded to 4K cells, if two conditions can be met: (1) the traffic arriving to the queue has a bounded normalized service lead/lag (NSLL) of K cells, and (2) the service schedule for the queue has a bounded NSLL of K cells. The second condition ensures that the traffic departing any queue also have a bounded NSLL of K cells.
Consider a network of packetswitched routers 610 as shown in
To achieve a bounded NSLL for every traffic flow arriving at the router 600(a) in the network 610, every traffic flow should be processed at the traffic source 650, to have a bounded NSLL before the traffic flow is injected into the network over transmission line 654(s,a). A traffic flow can be processed at a traffic source 650 to have a bounded NSLL using the method Schedule_GPS or the method Schedule_MNLF. For example, the source 650 may have a multiplexerserver 12 as shown in
To achieve a lower jitter and a lower bounded NSLL on the traffic leaving the source 650, the server 12 can use the method Schedule_MNLF. These methods will ensure that new traffic flows entering the network 610 have a bounded NSLL.
Referring to
To illustrate the methods, a computer simulation of a saturated network was performed. These results are presented in the paper by T. H. Szymanski, “Bounds on the EndtoEnd Delay and Jitter in InputBuffered and Internally Buffered IP Networks”, which was incorporated by reference earlier.
Referring to
At each router 600(1), 342 traffic flows arrive on all 10 input links and all 342 traffic flows exit on 10 output links. Each traffic flow has a guaranteed traffic rate. All 10 links 654 leaving each of the 20 routers 600 are 100% loaded, and each link supports on average 34.2 traffic flows. This model represents 100% loading, an extremal point in the capacity region for this network, while operating at unity speedup. Hundreds of other network models were developed and simulated with different topologies, larger switches, and longer path lengths and all yielded essentially identical results.
While the exemplary embodiments of the present invention are described with respect to various equations and figures, the present invention is not limited to the form of these equations or figures. One skilled in the art may modify these equations or figures by scaling, or may form different approximate solutions to the methods described herein employing any of a number of techniques well known in the art.
The various methods could be implemented using hardwarebased data processing means, including data processing logic in a Application Specific Integrated Circuit, a Field Programmable Logic Device, a Field programmable Gate Array, or any other hardware based data processing means.
The various methods could be implemented using softwarebased data processing means, including processing steps in a software program. Such software may be employed in, for example, a digital signal processor, a network processor, a microcontroller or a generalpurpose computer.
The various methods can be employed in electrical routers, alloptical routers, or wireless routers.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CDROMs, hard drives, or any other machinereadable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling or a network, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a generalpurpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
It will be further understood that various changes in the details, materials, and arrangements of the steps which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as expressed in the following claims. For example, the methods can be iterative or noniterative, the methods may use fixedsize or variablesize packets, the methods may be embedded into Input Queued crossbar switches, internally buffered crossbar switches, or other switches which use input Ports and virtual queues.
Of course, the above described embodiments are intended to be illustrative only and in no way limiting. The described embodiments of carrying out the invention are susceptible to many modifications of form, arrangement of parts, details and order of operation. The invention, rather, is intended to encompass all such modification within its scope, as defined by the claims.
Claims
1. A method to schedule N traffic flows through a multiplexer server system, said multiplexer server system comprising a queue for each of said N traffic flows, a multiplexer server, and an outgoing link, wherein each of said N traffic flows has an associated weight equaling the fraction of the outgoing link capacity requested by said flow, said method comprising
 (a) assigning each of said N traffic flows an initial normalized lag value,
 (b) processing each of said N traffic flows and assigning each of said N traffic flows a normalized lag increment value, equaling an ideal interdeparture time for average sizes packets associated with that traffic flow divided by the timeslot duration,
 (c) in each increment of the timeslot clock, processing said N traffic flows and adding the normalized lag increment value to the normalized lag value associated with each of said N traffic flows,
 (d) in each increment of the timeslot clock during which the outgoing link is idle, processing the N traffic flows and selecting one packet associated with one of said N traffic flows for transmission over said outgoing link, said one of said N traffic flows having the largest normalized lag value which exceeds a given threshold value,
 (e) removing one packet from the queue associated with said one of said N traffic flows, transmitting the packet over the outgoing transmission line for K timeslots, and decrementing the normalized lag value associated with said one of said N traffic flows by K times the normalized lag increment value.
2. The method of claim 1, where all packets have a fixed maximum size.
3. The method of claim 1, where all packets have a fixed maximum size, and each packet can be transmitted over the outgoing link in a fixed number of timeslots.
4. The method of claim 1, where all packets have a fixed maximum size, and each packet can be transmitted over the outgoing link in one timeslot.
5. A method to schedule traffic flows through an input port associated with a switching matrix, said input port comprising multiple Virtual Output Queues (VOQs), one server, and one outgoing link associated with a switching matrix, wherein each of said VOQs stores packets associated with a subset of said N traffic flows, and wherein packets within one VOQ request a common output port of the switching matrix, said method comprising steps of
 (a) assigning each of said N VOQs a weight equaling the fraction of the capacity of said outgoing link requested by said VOQ,
 (b) wherein said server selects said VOQs for transmission onto the outgoing link such that traffic associated with each of said N VOQ is transmitted over the outgoing link with a bounded normalized service lead/lag.
6. A method to schedule multiple GuaranteedRate (GR) traffic flows through an input port associated with a switching matrix, said input port comprising N Virtual Output Queues (VOQs), one VOQserver, and one outgoing link associated with a switching matrix, said outgoing link called a port link, each of said VOQs comprising multiple flowVOQs, one gated flowserver and one outgoing link connected indirectly or directly to the VOQserver, each of said outgoing links called a VOQlink, each of said flowVOQs storing packets associated with one of said GR traffic flows,
 (a) wherein each VOQ is assigned a weight equaling the fraction of the capacity of the outgoing port link requested by the VOQ,
 (b) wherein the VOQserver selects VOQs for service in proportion to the weight of the VOQ,
 (c) wherein each gated flowserver associated with each VOQ receives control signals called enable signals from the VOQserver, and selects one GR traffic flow for transmission onto the outgoing VOQlink in response to an enable signal, such that each of said GR traffic flows is transmitted over the outgoing port link with a bounded normalized service lead/lag.
7. The method of claim 6 where the switching matrix is unbuffered.
8. The method of claim 6 where the switching matrix is buffered.
Type: Application
Filed: Jan 28, 2014
Publication Date: Jul 24, 2014
Inventor: Ted H. Szymanski (Toronto)
Application Number: 14/166,340
International Classification: H04L 12/875 (20060101); H04L 12/863 (20060101); H04L 12/841 (20060101);