What is IBM Flex Chassis?IBM Flex chassis is the new blade technology from IBM which replaces the 10 year old BladeCenter H chassis. Like the BladeCenter chassis, the Flex can fit fully functional network switches into the chassis (unlike Cisco which puts dummy pass-thru modules that plug into top of rack switches).
Environment SetupMy customer had 1 Flex chassis in the main site and another in the disaster recovery (DR) site. Each chassis had IBM 10Gb EN4093 Scalable Switches. The 2 switches in each chassis were interconnected via Virtual Link Aggregation (VLAG) to load balance the traffic between each other. They were connected to 1 Cisco ToR switch. The spanning tree protocol (STP) was PVRST+.
Some of the nodes/servers in the chassis were running VMware & some were running RedHat Enterprise Linux (RHEL) 6 with Oracle RAC setup on top of that.
The Oracle nodes needed to have multiple IPs belonging to multiple VLANs. The nodes had only 2 internal 10Gb NICs, so NIC Bonding with VLAN tagging was the best choice. I used Linux's native NIC Bonding.
As of this writing, Emulex does not have a NIC Bonding software for their chips on the IBM Flex nodes.
The ProblemThe RHEL nodes were configured with active-passive NIC teaming, but they were losing connectivity randomly and Oracle RAC would report that one of the configured interfaces could no longer communicate and the cluster is affected.
The SolutionThe chassis switches act as 2 switches: 1 switch to the nodes and 1 switch to the outside world. Because of this, even if the switch loses connectivity to the outside world, the internal nodes wouldn't know about the uplink failure. Also, for some reason, the MACs weren't being updated on the Cisco ToR switch.
So, instead of using "miimon" which monitors the physical link between the node and the internal ports of the switch, I changed it to "arp" which will send ARP requests through the ToR L3 switches and that will keep the MAC table refreshed and prevent the IPs from flapping on the nodes.
ConfigurationThe following configuration was done on RHEL 6. It should work similarly on all distributions, but the location of the files may differ.
Enabling NIC Bonding and Setting ParametersAppend this line to the file /etc/modprobe.conf
# bonding config
alias bond0 bonding
options bond0 mode=active-backup arp_interval=50 arp_ip_target=10.10.5.1,10.10.1.1
arp_interval value is in milliseconds. You can specify multiple target IPs. I suggest adding 2. The maximum allowed is 16. The target IPs are the IPs of a VLAN's gateway.
Configuring Interfaces with VLAN Taggingcd to /etc/sysconfig/network-scripts and create the following files
File name: ifcfg-bond0
File name: ifcfg-bond0.105
File name: ifcfg-bond0.101
The file name has to end with bond0.VLANID, and the device name has to match that. The IP address schema can be whatever was defined by the network team on that VLAN.
The network engineers I worked with, create VLAN IDs & IP schemas like this:
VLAN 105 -> 10.10.5.x
VLAN 1055 -> 10.10.55.x
You don't have to follow the same way, but it makes it easy to know the VLAN ID from the IP.
You can repeat the above steps and create as many files as you have VLANs.
Modify the following files:
File name: ifcfg-eth0
File name: ifcfg-eth1
Repeat these steps for the number of NICs that you have & want them to participate in the NIC Bonding group.
Configuring The Chassis Switches
The only thing missing now is creating the VLANs on the switches, then enabling VLAN Tagging on the nodes' NICs (internal ports) on the chassis switches. I'll be using the "iscli" command line interface instead of "ibmcli."
It's better to have firmware 7.5.3+ on the switches before proceeding, else some commands may be different, and some features may be missing (like Auto Spanning Tree Group assignment), and it'll require that you do extra work manually.
Enabling VLAN Tagging on the internal ports:
interface port INTA1-INTA14
This will create VLAN ID 101, and place the internal ports 1-14 in it, which belong to nodes 1-14. I wrote it this way to show how you can define non-consecutive ports.
The default private VLAN ID (PVID) is 1. This is the native VLAN. Any non-tagged traffic will be siphoned there. If the customer's native VLAN ID is different, change this value. If the customer does not intend to have any untagged traffic, it's better to change this value to something that doesn't exist on the customer side to create a black-hole on the internal switches for unwanted untagged traffic.
To change the PVID:
interface port INTA1-INTA14
Assuming the native VLAN at the customer side is 5. To set it to something that doesn't exist, agree with the customer on a VLAN that they'll never use. In my case, I often use 3999.
interface port INTA1-INTA14
You don't have to create the VLAN beforehand. The switch will automatically create the VLAN, assign it to its own Spanning Tree Group (STG) and change the PVID of the defined node ports.
That's it! Now restart the network services and bonding interfaces should come up.
- RHEL: Linux Bond / Team Multiple Network Interfaces (NIC) Into a Single Interface
- Linux Ethernet Bonding Driver HOWTO
- NIC Bonding for KVM (has cute graphs)
I highly recommend that you read the 2nd link (Kernel guide) before doing anything. It explains the different types of modes (active/passive, active/active, EtherChannel, ...etc.) and whether they require ToR switch support or not.
Remember that you cannot use active/active in an EtherChannel/PortChannel manner because the 2 internal NICs in each node belong to two different switches, and EtherChannel require that the ports belong to the same switch. It is possible if you stack the two chassis switches, but I have not attempted this before.
Also, make sure the STP used on the IBM switches match whatever is there on the customer side, otherwise you'll cause a network loop and bring down the entire customer network!
May your packets serve you well.