Linux bonding switch configurations Link Monitoring

Switch Configuration

For this section, “switch” refers to whatever system the bonded devices are directly connected to (i.e., where the other end of
the cable plugs into). This may be an actual dedicated switch device, or it may be another regular system (e.g., another computer running
Linux),

The active-backup, balance-tlb and balance-alb modes do not require any specific configuration of the switch.

The 802.3ad mode requires that the switch have the appropriate ports configured as an 802.3ad aggregation. The precise method used
to configure this varies from switch to switch, but, for example, a Cisco 3550 series switch requires that the appropriate ports first be
grouped together in a single etherchannel instance, then that etherchannel is set to mode “lacp” to enable 802.3ad (instead of
standard EtherChannel).

The balance-rr, balance-xor and broadcast modes generally require that the switch have the appropriate ports grouped together.
The nomenclature for such a group differs between switches, it may be called an “etherchannel” (as in the Cisco example, above), a “trunk
group” or some other similar variation. For these modes, each switch will also have its own configuration options for the switch’s transmit
policy to the bond. Typical choices include XOR of either the MAC or IP addresses. The transmit policy of the two peers does not need to
match. For these three modes, the bonding mode really selects a transmit policy for an EtherChannel group; all three will interoperate
with another EtherChannel group.

802.1q VLAN Support

It is possible to configure VLAN devices over a bond interface
using the 8021q driver. However, only packets coming from the 8021q
driver and passing through bonding will be tagged by default. Self
generated packets, for example, bonding’s learning packets or ARP
packets generated by either ALB mode or the ARP monitor mechanism, are
tagged internally by bonding itself. As a result, bonding must
“learn” the VLAN IDs configured above it, and use those IDs to tag
self generated packets.

For reasons of simplicity, and to support the use of adapters
that can do VLAN hardware acceleration offloading, the bonding
interface declares itself as fully hardware offloading capable, it gets
the add_vid/kill_vid notifications to gather the necessary
information, and it propagates those actions to the slaves. In case
of mixed adapter types, hardware accelerated tagged packets that
should go through an adapter that is not offloading capable are
“un-accelerated” by the bonding driver so the VLAN tag sits in the
regular location.

VLAN interfaces *must* be added on top of a bonding interface
only after enslaving at least one slave. The bonding interface has a
hardware address of 00:00:00:00:00:00 until the first slave is added.
If the VLAN interface is created prior to the first enslavement, it
would pick up the all-zeroes hardware address. Once the first slave
is attached to the bond, the bond device itself will pick up the
slave’s hardware address, which is then available for the VLAN device.

Also, be aware that a similar problem can occur if all slaves
are released from a bond that still has one or more VLAN interfaces on
top of it. When a new slave is added, the bonding interface will
obtain its hardware address from the first slave, which might not
match the hardware address of the VLAN interfaces (which was
ultimately copied from an earlier slave).

There are two methods to insure that the VLAN device operates
with the correct hardware address if all slaves are removed from a
bond interface:

1. Remove all VLAN interfaces then recreate them

2. Set the bonding interface’s hardware address so that it
matches the hardware address of the VLAN interfaces.

Note that changing a VLAN interface’s HW address would set the
underlying device — i.e. the bonding interface — to promiscuous
mode, which might not be what you want.

Link Monitoring

The bonding driver at present supports two schemes for
monitoring a slave device’s link state: the ARP monitor and the MII
monitor.

At the present time, due to implementation restrictions in the
bonding driver itself, it is not possible to enable both ARP and MII
monitoring simultaneously.

ARP Monitor Operation

The ARP monitor operates as its name suggests: it sends ARP
queries to one or more designated peer systems on the network, and
uses the response as an indication that the link is operating. This
gives some assurance that traffic is actually flowing to and from one
or more peers on the local network.

The ARP monitor relies on the device driver itself to verify
that traffic is flowing. In particular, the driver must keep up to
date the last receive time, dev->last_rx, and transmit start time,
dev->trans_start. If these are not updated by the driver, then the
ARP monitor will immediately fail any slaves using that driver, and
those slaves will stay down. If networking monitoring (tcpdump, etc)
shows the ARP requests and replies on the network, then it may be that
your device driver is not updating last_rx and trans_start.

Configuring Multiple ARP Targets

While ARP monitoring can be done with just one target, it can
be useful in a High Availability setup to have several targets to
monitor. In the case of just one target, the target itself may go
down or have a problem making it unresponsive to ARP requests. Having
an additional target (or several) increases the reliability of the ARP
monitoring.

Multiple ARP targets must be separated by commas as follows:

# example options for ARP monitoring with three targets
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9

For just a single target the options would resemble:

# example options for ARP monitoring with one target
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.100

II Monitor Operation

The MII monitor monitors only the carrier state of the local
network interface. It accomplishes this in one of three ways: by
depending upon the device driver to maintain its carrier state, by
querying the device’s MII registers, or by making an ethtool query to
the device.

If the use_carrier module parameter is 1 (the default value),
then the MII monitor will rely on the driver for carrier state
information (via the netif_carrier subsystem). As explained in the
use_carrier parameter information, above, if the MII monitor fails to
detect carrier loss on the device (e.g., when the cable is physically
disconnected), it may be that the driver does not support
netif_carrier.

If use_carrier is 0, then the MII monitor will first query the
device’s (via ioctl) MII registers and check the link state. If that
request fails (not just that it returns carrier down), then the MII
monitor will make an ethtool ETHOOL_GLINK request to attempt to obtain
the same information. If both methods fail (i.e., the driver either
does not support or had some error in processing both the MII register
and ethtool requests), then the MII monitor will assume the link is
up.

Potential Sources of Trouble

Adventures in Routing:

When bonding is configured, it is important that the slave
devices not have routes that supersede routes of the master (or,
generally, not have routes at all). For example, suppose the bonding
device bond0 has two slaves, eth0 and eth1, and the routing table is
as follows:

Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0
127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo

This routing configuration will likely still update the
receive/transmit times in the driver (needed by the ARP monitor), but may bypass the bonding driver (because outgoing traffic to, in this
case, another host on network 10 would use eth0 or eth1 before bond0).

The ARP monitor (and ARP itself) may become confused by this configuration, because ARP requests (generated by the ARP monitor)
will be sent on one interface (bond0), but the corresponding reply will arrive on a different interface (eth0). This reply looks to ARP
as an unsolicited ARP reply (because ARP matches replies on an interface basis), and is discarded. The MII monitor is not affected
by the state of the routing table.

The solution here is simply to insure that slaves do not have routes of their own, and if for some reason they must, those routes do
not supersede routes of their master. This should generally be the case, but unusual configurations or errant manual or automatic static
route additions may cause trouble.

Ethernet Device Renaming

On systems with network configuration scripts that do not associate physical devices directly with network interface names (so
that the same physical device always has the same “ethX” name), it may be necessary to add some special logic to either /etc/modules.conf or
/etc/modprobe.conf (depending upon which is installed on the system).

For example, given a modules.conf containing the following:

alias bond0 bonding
options bond0 mode=some-mode miimon=50
alias eth0 tg3
alias eth1 tg3
alias eth2 e1000
alias eth3 e1000

If neither eth0 and eth1 are slaves to bond0, then when the bond0 interface comes up, the devices may end up reordered. This
happens because bonding is loaded first, then its slave device’s drivers are loaded next. Since no other drivers have been loaded,
when the e1000 driver loads, it will receive eth0 and eth1 for its devices, but the bonding configuration tries to enslave eth2 and eth3
(which may later be assigned to the tg3 devices).

Adding the following:

add above bonding e1000 tg3

causes modprobe to load e1000 then tg3, in that order, when bonding is loaded. This command is fully documented in the
modules.conf manual page.

On systems utilizing modprobe.conf (or modprobe.conf.local), an equivalent problem can occur. In this case, the following can be
added to modprobe.conf (or modprobe.conf.local, as appropriate), as follows (all on one line; it has been split here for clarity):

install bonding /sbin/modprobe tg3; /sbin/modprobe e1000;

/sbin/modprobe –ignore-install bonding

This will, when loading the bonding module, rather than performing the normal action, instead execute the provided command.
This command loads the device drivers in the order needed, then calls modprobe with –ignore-install to cause the normal action to then take
place. Full documentation on this can be found in the modprobe.conf
and modprobe manual pages.

Painfully Slow Or No Failed Link Detection By Miimon

By default, bonding enables the use_carrier option, which instructs bonding to trust the driver to maintain carrier state.

As discussed in the options section, above, some drivers do not support the netif_carrier_on/_off link state tracking system.
With use_carrier enabled, bonding will always see these links as up, regardless of their actual state.

Additionally, other drivers do support netif_carrier, but do not maintain it in real time, e.g., only polling the link state at
some fixed interval. In this case, miimon will detect failures, but only after some long period of time has expired. If it appears that
miimon is very slow in detecting link failures, try specifying use_carrier=0 to see if that improves the failure detection time. If
it does, then it may be that the driver checks the carrier state at a fixed interval, but does not cache the MII register values (so the
use_carrier=0 method of querying the registers directly works). If use_carrier=0 does not improve the failover, then the driver may cache
the registers, or the problem may be elsewhere.

Also, remember that miimon only checks for the device’s carrier state. It has no way to determine the state of devices on or
beyond other ports of a switch, or if a switch is refusing to pass traffic while still maintaining carrier on.

SNMP agents

If running SNMP agents, the bonding driver should be loaded before any network drivers participating in a bond. This requirement
is due to the interface index (ipAdEntIfIndex) being associated to the first interface found with a given IP address. That is, there is
only one ipAdEntIfIndex for each IP address. For example, if eth0 and eth1 are slaves of bond0 and the driver for eth0 is loaded before the
bonding driver, the interface for the IP address will be associated with the eth0 interface. This configuration is shown below, the IP
address 192.168.1.1 has an interface index of 2 which indexes to eth0 in the ifDescr table (ifDescr.2).

interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = eth0
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
interfaces.ifTable.ifEntry.ifDescr.4 = eth2
interfaces.ifTable.ifEntry.ifDescr.5 = eth3
interfaces.ifTable.ifEntry.ifDescr.6 = bond0
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1

This problem is avoided by loading the bonding driver before any network drivers participating in a bond. Below is an example of
loading the bonding driver first, the IP address 192.168.1.1 is correctly associated with ifDescr.2.

interfaces.ifTable.ifEntry.ifDescr.1 = lo
interfaces.ifTable.ifEntry.ifDescr.2 = bond0
interfaces.ifTable.ifEntry.ifDescr.3 = eth0
interfaces.ifTable.ifEntry.ifDescr.4 = eth1
interfaces.ifTable.ifEntry.ifDescr.5 = eth2
interfaces.ifTable.ifEntry.ifDescr.6 = eth3
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1

While some distributions may not report the interface name in ifDescr, the association between the IP address and IfIndex remains
and SNMP functions such as Interface_Scan_Next will report that association.

Promiscuous mode

When running network monitoring tools, e.g., tcpdump, it is common to enable promiscuous mode on the device, so that all traffic
is seen (instead of seeing only traffic destined for the local host). The bonding driver handles promiscuous mode changes to the bonding
master device (e.g., bond0), and propagates the setting to the slave devices.

For the balance-rr, balance-xor, broadcast, and 802.3ad modes, the promiscuous mode setting is propagated to all slaves.

For the active-backup, balance-tlb and balance-alb modes, the promiscuous mode setting is propagated only to the active slave.

For balance-tlb mode, the active slave is the slave currently receiving inbound traffic.

For balance-alb mode, the active slave is the slave used as a “primary.” This slave is used for mode-specific control traffic, for
sending to peers that are unassigned or if the load is unbalanced. For the active-backup, balance-tlb and balance-alb modes, when
the active slave changes (e.g., due to a link failure), the promiscuous setting will be propagated to the new active slave.

HowToLinuxBlog

a Complete Linux Administrators site

MOST COMMENTED

Ubuntu 16.04 No desktop only shows background wallpaper

Dig command examples

OpenVz(Kernel Base Open source Virtulization)

Install Ansible on Linux

Configuring puppet4 server agent and puppetdb on ubuntu16.04

Installing postgresql on ubuntu 16.04

opensource puppet4 installation on ubuntu16.04