Now with Ethernet WAN connections becoming the norm in most areas of the United States (at least in and around major metropolitan areas) I thought I’d write about some common design mistakes I have seen customers make when it comes to both provisioning and the equipment they use to make these connections. Most of these issues listed below assume that there is some sort of traffic (e.g. Voice over IP) that needs to be handled with bandwidth guarantees and/or absolute priority:
Issue #1: Using an undersized routing platform
- So you’ve got a killer 250 Mbps Ethernet WAN connection to your remote office across town for a great price. And your Cisco 2821 routers on each end have Gigabit interfaces on them. It may be tempting to use this router. DON’T DO IT. You may have even hooked it up and run an iPerf test to confirm you can get the full 250 Mbps with no packet loss. YOU STILL SHOULDN’T DO IT.
- Here’s why: Except for the highest-end branch routers, most of them are not designed to scale to these speeds. As is often the case with these platforms, they are running multiple services (i.e. Voice termination, Firewall, WAN Acceleration, etc), all of which steal cycles from the router’s CPU. While it is true that most traffic traverses the data plane without having to hit the CPU at all, in general at higher speeds there are still more demands on the CPU for things such as routing table changes, multicast/broadcast traffic, and the aforementioned standard services. I’ve seen many a ghost in the network when using platforms that are not sized properly for the connection they are servicing.
Issue #2: Using a LAN Switch to make the WAN connection
- How about simply using a LAN switch (say, a Cisco 3750 or Juniper EX4200) instead of the underpowered Cisco 2821? DON’T DO IT. Here’s why: With a few exceptions, Ethernet switches have ASIC-based interface buffers that are fixed in size that are not large enough to handle the demands of Low Latency Queuing (LLQ), which requires rearrangement of packets in order to provide priority and bandwidth guarantees to designated traffic. While LAN Switches with Gigabit ports won’t have a capacity issue (they will be able to switch packets at full line rate, they won’t be able to properly shape and prioritize traffic as most latency- or jitter-sensitive applications will require.
Issue #3: Not shaping traffic to the Committed Information Rate (CIR) on Sub-rate connections
- When a carrier hands you a 250 Mbps Ethernet circuit (i.e. a 250 Mbps CIR), they are obviously going to give you a Gigabit port for the physical connection. This means that unless you or the carrier configures the devices otherwise, traffic can burst well above the 250 Mbps CIR. What happens to that traffic that exceeds the CIR? It varies widely between carriers. Some carriers will strictly police the traffic indiscriminately, some will have very lenient shaping policies, and some with police your traffic while trusting your Quality of Service classifications (IP Precedence or DCSP) – thus ensuring that higher priority traffic gets transmitted while best effort traffic may be dropped.
- So, how do you make sure they carrier doesn’t throw your network’s high priority traffic away? You need to implement Hierarchical QoS. This creates a “policy with a policy” that performs Low Latency Queuing while simultaneous shaping traffic to the CIR. In general, you should always use Hierarchical QoS on all Ethernet WAN connections that are “subrate” (i.e. the CIR is less than port speed).
- In general, most routing platforms and specialized Metro Ethernet switches (such as the Cisco ME 3600X) are needed to properly deploy Hierarchical QoS.
Issue #4: Not provisioning QoS properly with the carrier
- This is related to #3. Most carriers have configured QoS profiles with different classes where they will honor your QoS tags and prioritize the traffic accordingly. It is important to understand to ask the carrier for the available QoS profiles and configure QoS on the customer interfaces to match those profiles. Nearly every new environment I have worked in has connections are provisioned in a way that does not align to the customer or application requirements.
Here’s why: Unless you made specific arrangements, your management tools are probably polling interface utilization at 5-, 10-, or 15- minute intervals. Congestion can occur in intervals that are in fractions of a second. When congestion occurs, this traffic may not necessarily be dropped (they are likely buffered), but it could be delayed long enough to interfere with quality of applications that have strict jitter or packet loss requirements, such as Voice, Fetal Monitoring, or Trading floor applications. The same is true for QoS for LAN links (I’ll discuss this in more depth at a later time).
Carrier Ethernet WAN connections are changing the landscape of most WANs today by substantially increasing the available bandwidth at lower costs. Make sure that you optimize the performance of these connections by working with your carrier to properly provision the QoS profiles available and accurately configure QoS according to those profiles.