Saturday, January 23, 2016

16 Gb Brocade SAN Fabric Merge

Introduction

A customer with an existing setup from HP with HP-branded Brocade switches wanted to connect those switches to the newly acquired IBM setup (also using Brocade switches). The HP switches are the 24-port 8 Gb switches, and the IBM ones are 48-port 16 Gb switches. The final goal is to virtualize the HP storage behind the V7000 storage, but this will not be discussed in this post.

The HP SAN switches had existing configurations & were in production. The IBM switches also had configurations for an ongoing implementation.

To merge the SAN fabrics, there are 2 ways:

  1. Wipe one of them (clear the config), disable it, then enable it. The config of the other switch will be written to this empty one.
  2. Merge 2 different fabrics without wiping any data.
This post will address point (2), because I didn't want to re-do all the zoning from scratch. That's a waste of time. The steps will be done in command line (CLI), because I hate java.

Why Write This Post?

I was reading Brocade's forums and many were talking about using fabric merge tools and that the two fabrics must have different names, and there was a lot of wrong or outdated information that no longer applies to the new Fabric OS 7.x (new switch firmware).

Status

  1. HP switches had Fabric OS (FOS) 7.1.
  2. IBM switches had FOS 7.4.
  3. HP switches had full fabric license.
  4. IBM 48-port switches include "Full Fabric" license by default, but doesn't show with "licenseshow" command. It's bundled & enabled by default.
  5. HP switches had domain ID: 11 & 12.
  6. IBM switches had default domain ID: 1.
  7. Switch configuration name on HP was different from the one on IBM.
  8. IBM switch 1 connected to HP switch 1 using 1 FC cable. switch 2 connected to switch 2 using 1 FC cable.
  9. IBM switches had 16 Gb SFPs. HP had 8 Gb SFPs. Speed of IBM SFP used for SAN connection was fixed to 8 Gb (no auto negotiate).

Requirements

  1. Fabric OS has to be 6.x or 7.x on all switches connecting to each other. The minor version ".x" does not have to match, but it's recommended to keep the switches on the same level, if possible.
  2. Full Fabric license must be available on 24-port switches. It's available by default on 48-port switches.
  3. Change Domain ID from default value to a unique value. The 2 switches connecting to each other must have different Domain IDs.
  4. Switch configuration names must be the same for the fabric to merge. If they are different, "Zone Conflict" error will show on the secondary switch.
  5. If you have a lot of traffic going from one switch to another switch, it's advised to purchase the "Trunking License" to allow aggregating multiple FC ports/links together.
  6. Aliases and zone names must be unique before merging the fabric. If you have similar alias names on the 2 different switches, you have to rename the aliases/zones on the secondary switch (the one that you can disable to merge the fabric).
  7. Aliases that have the same WWN on both secondary and primary switches, must have the same name on both fabrics. This is a very unique case, but possible if you're virtualizing the WWNs of your servers.
  8. Make sure switch date, timezone & time are all correct before you merge the switches. Changing the timezone requires a switch restart, so plan for the downtime.
  9. Default user is 'admin' and default password is 'password'.
  10. Do not connect any FC cables between the HP/IBM (different switches) until you're told to do so. Follow the steps exactly as shown below.

Steps

In the steps below, a line starting with "#" means it's a command you should type. Type the command without the "#" character.

Some steps will require rebooting the switch. Some will require disabling the switch more than one time, which makes it offline, and stops all storage access traffic. It's better to change the paths from the servers to the 2nd switch manually, or if you're sure the multipath drivers are working properly, you can disable server ports.

The primary switch is the one that will remain operational. The secondary switch is the one where we are making all these changes & can afford downtime.

Disable Ports

It's better to disable server ports, to prevent multipath driver from using the paths again when they're online, but before you finish your activity. Do this on ONE switch only! After you successfully merge fabrics on this switch, enable ports, then move to the 2nd switch. Do NOT disable ports on both switches at the same time, if you have active servers connected to the SAN switches.

  1. List available ports and WWNs: # switchshow
  2. # portdisable <port number>
    Example: # portdisable 15
    This will disable the 16th port (port numbering starts from zero)


  • Repeat this for all ports.



  • Change the Timezone

    1. # date
      This will show current time, date & timezone. Example: Tue Jan 12 09:00:03 AST 2016. AST = Arab Standard Time timezone.
    2. # tstimezone --interactive
    3. Follow the prompts. Choose the continent, then the country.
    4. After finishing, a message will say: "System Time Zone change will take effect at next reboot"
    5. If time is not correct, change it before you reboot. See the steps below.
      If the time is correct, you can now reboot the switch: # reboot


    Change the Time and Date

    1. date [MMDDhhmm[[CC]YY]]
      MM = Month = 01, 02, ..., 12
      DD = Day = 01, 02, ..., 31
      hh = Hour = 00, 01, 02, ..., 23
      mm = Minute = 00, 01, 02, ..., 59
      CC = First two digits of the year = 20 for 2016
      YY = Last two digits of the year = 16 for 2016
    2. To change the time & date to Jan 23 2016 21:43:00 (9:43 PM)
      # date 012321432016
    3. Time change does not require a reboot. If you changed the timezone, you should reboot now.

    Display Current Domain ID

    1. # switchshow
    2. Top of the output will show a line: switchDomain: 1
      1 is the default value.

    Change Domain ID

    1. To change the Domain ID of a switch, the switch must be disabled first:
      # switchdisable
      This will take the switch offline and stop all traffic.
    2. Start the configuration process to change switch parameters:
      # configure
    3. Fabric parameters (yes, y, no, n): [no] yes
      Domain: (1..239) [1] <Unique ID must be different from the switch you will connect to>
    4. Press Enter for all other parameters to use default values. No need to change any of them.
    5. # switchenable


    Rename Zone Configuration

    You should rename the zone config to match the primary switch. The primary switch is the one that will remain operational. The secondary switch is the one where we are making all these changes.
    1. # cfgshow
    2. This will print current aliases, zones and zone config information. At the top, you'll see the config name:
      Defined configuration:
       cfg: HO_SANSW1_Top
    3. The config must be disabled before you can rename it: # cfgdisable
    4. Now, rename the config to be the same as the primary switch: # zoneobjectrename <current name>, <new name>
      Example: # zoneobjectrename HO_SANSW1_Top, Production_SAN1
    5. Remember, both primary (HP switch in my case) and secondary (IBM in my case) must have the same config name to be able to merge the fabrics.
    6. Save the new config changes: # cfgsave
    7. Run the command again to see the new config name: # cfgshow
    8. Now activate the config: # cfgenable <config name>

    Change Port Speed

    All ports are disabled. We need to change the speed of the port to make it fixed instead of using auto negotiate. This must be done on both primary and secondary switches.
    1. # portcfgspeed <port number> <speed>
      Example: # portcfgspeed 35 16
      This will fix the speed of port 35 to 16 Gbps. Auto negotiation will be disabled.
    2. Do this on the port that will connect each primary SAN switch to each secondary SAN switch.
    3. Keep the port disabled on the secondary switch.
    4. Enable the port on the primary switch: # portenable <port number>
    5. Connect your Fiber Channel cables into the ports.

    Merging The Fabrics

    1. First, save the current zone names of the secondary switch in a text file. We will need them after this step: # cfgshow
      Copy the output and save it in a text/word file.
    2. On the secondary switch, disable the config: # cfgdisable
    3. Now enable the port connecting the secondary & primary switches: # portenable 35
    4. Wait 10-30 seconds before proceeding to give enough time for the link to establish and the 2 switches to talk.
    5. Disable the secondary switch to make it the slave and to add the config from the primary:
      # switchdisable
    6. Enable the secondary switch: # switchenable
    7. Wait 10-50 seconds, then check the switch: # switchshow
      You should see in the line of the port connecting the switches something like this:
      35 35 1f2300 id 8G Online FC E-Port 10:00:00:xx:xx:xx:xx:xx "" (upstream)
    8. Wait some time and the name of the primary switch will appear between the double quotes.
    9. You should also see both switches in the same fabric now: # fabricshowThis should show the names of the primary & secondary switches.
    10. If you type # cfgshow it will show all zones and aliases from both switches, but only those from the primary are in the active config.

    Enabling Zones of Secondary Switch

    The fabrics are now merged, but the zones of the secondary switch are not in the active config yet. We need to add them to the config and enable the config.
    1. Open the text file of the zone names (cfgshow output) from the previous step.
    2. To add the zones, type the command: # cfgadd "<zone name>", "zone1; zone2; zone3"
      Notice it's a semicolon between the zone names. You can add multiple zones at the same time to the active config.
      If you're lazy and java works for you, you can use the graphical interface to select the zones and add them to the config.
    3. When done, type: # cfgsave
      press "y" to save it.
      Then type: # cfgenable <config name>
    Congratulations! Now all zones are active from both switches. The ports are still disabled, though, so let's enable them.

    Enable Ports

    1. List available ports and WWNs: # switchshow
    2. # portenable
      Example: # portdisable 0
      This will enable the 1st port (port numbering starts from zero)
    3. Repeat this for all ports.
    4. You can now check your servers and storage and all links should be operational.
    Congratulations! You're now done with the first switch connectivity. Make sure your links are stable, then move on to the remaining switches.

    Errors

    Zone Conflicts and Segmentation

    For some reason, the switch showed "segmented" and "zone conflict" messages and upon a reboot, all ports were disabled. Trying to enable a specific port gave the error: "Port 35: Port enable failed due to unknown system error"

    I rebooted the SAN switch again and the ports (and switch) became online again. Looks like it froze at some point and needed another reboot. If this happens often, upgrade the FOS to latest stable version. For me, it only happened once.

    If you still get "zone conflict" after finishing all the steps, then you have an alias with the same WWN but different names. To fix it, rename the alias using the "zoneobjectrename" command as shown above.

    Unstable Ports

    I was unlucky to have the ports being unstable. The link kept going online & offline, flapping many times and sometimes it connects at 16 Gbps and sometimes at 8 Gbps (before I fixed the speed to 8 Gbps). Also, it prevented the switches from creating a fabric connection.

    First clear the stats to not carry any old data: # portstatsclear <port number>, then you can check your port statistics by issuing the command: # portshow <port number>
    In the output, if you have very large numbers in any of these parameters:
    • Unknown
    • Parity_err
    • 2_parity_err
    • Link failure
    • Loss_of_sync
    • Loss_of_sig
    • Invalid_word
    • Invalid_crc
    In my case, I had to change 2 SFPs, one on the old HP SAN switch and one on the new IBM SAN switch. I also had to change the port slot on the old HP switch because the port slot itself had problems. I'm glad the FC cable was good.

    References

    Wednesday, January 6, 2016

    Lenovo G8272 and EN4093R Invalid Signature Firmware Upgrade Problem

    While trying to upgrade the firmware of brand new Lenovo G8272 switches from the initial release of 8.2.1.0, I got an error after uploading the new firmware:
    Failure: image contains invalid signature.
    G8272(config)#
    Feb  9 18:58:41 G8272 ERROR   mgmt: Firmware download failed to image1

    I only got 2 results online and both pointed at Changelogs that mention the issue has been fixed, but not how! I contacted a great person within Lenovo who checked internal documents and it turned out that this issue affects G8272 and EN4093R switches manufactured on December 2015 (specifically, 12th week of 2015). (Thank you Zeeshan!)

    Cause

    "The switch software uses it hardware serial number and the public keys on its kernel file system to generate a private key to decrypt the OS or Boot image being uploaded to it and then proceeds to install it. If the serial number of the switch is changed for some reason, the combination of the hardware serial number and the public keys will fail to generate the appropriate private key to decrypt the uploaded image and reports that the image has an invalid signature."

    In my case, the switches were fresh & no one changed any serial code, but were still affected.

    Fix

    "In order to remedy this situation, the way out is to remove the public keys installed on the kernel file system and reboot the switch. During reboot, the switch will generate new set of public keys using the current serial number. With these newly generated public keys, the switch will be able to compute the proper private key to decrypt the uploaded images."

    Requirements

    • Serial cable (mini-USB that came with the switch)
    • Serial-to-USB kit (you have to buy this on your own)
    • CAT5E or CAT6 STP or UTP cable
    • New firmware (8.2.4.0 as of this writing)
    • PuTTY or your favorite serial/telnet/ssh tool
    • admin password (default is admin:admin)
    • ftp/tftp server software. I suggest 3CDaemon (FTP & TFTP) or Filezilla (FTP & SFTP).

    On a Flex chassis, you should enable Serial Over LAN (SOL) from the Chassis Management Module (CMM) to be able to access the serial port of the switches. Use UTP cable on the CMM port not the switch.

    I highly recommend configuring the management port (RJ45) to use for firmware upload since it'll be very fast, as it'll take 45 minutes to upload one OS image! While it takes 1 minute on the management port via Ethernet.

    Note: The initial firmware (8.2.1.0 does not support SSH). However, SSH is enabled by default once you upgrade to 8.2.4.0. Make sure you disable HTTP & Telnet after the upgrade.

    Procedure

    Any line that starts with # it means this is a command to be typed (without the # sign).
    1. Connect to serial port on the switch (mini-USB port)
    2. Login as admin user
    3. Reboot the switch: #reload
    4. When the switch shows Memory Test, press Shift+t to enter Manufacturer Mode.
      U-Boot 2009.06 (Feb 23 2015 - 07:27:18)

      CPU0:  P2020, Version: 2.1, (0x80e20021)
      Core:  E500, Version: 5.1, (0x80211051)
      Clock Configuration:
             CPU0:1200 MHz, CPU1:1200 MHz,
             CCB:600  MHz,
             DDR:400  MHz (800 MT/s data rate) (Asynchronous), LBC:37.500 MHz
      L1:    D-cache 32 kB enabled
             I-cache 32 kB enabled
      Board: Networking OS RackSwitch G8272
      I2C:   ready
      DRAM:   DDR:  4 GB

      Memory Test ..........

      Manufacturing Mode

      FLASH: 16 MB
      L2:    512 KB enabled
      PCIe1: Root Complex of PCIe, x2, regs @ 0xffe0a000
      PCIe1: Bus 00 - 01
      MMC:  FSL_ESDHC: 0
      Note : Operational Mode has changed.
      Net:   eTSEC1, eTSEC2 [PRIME]

      Booting OS
    5. Once the OS boots, enter the admin password (default is admin)
    6. You should now be at the prompt where it says: Diagnostics#
    7. Enter diagnostics mode: #linux
    8. List the filesystem to see if there are existing public encryption keys: #ls /user/*.pem
      > ls /user/*.pem
      /user/development_key.pub.pem  /user/production_key.pub.pem
    9. The two files above should show. Delete them: #rm /user/*.pem
    10. That's it. Now quit by typing q in the command: #q
    11. Now reboot: #/boot/reset
    12. Press "y" to confirm rebooting. The switch will now reboot and generate new keys to match the current hardware serials and whatnot.
    13. Now connect via Ethernet (or configure an IP interface on the management port then connect) and upgrade the switch
    14. #copy tftp image1 address 192.168.70.13 filename G8272-8.2.4.0_OS.man mgt-port
      Change tftp to match what protocol you're using.
      Change 192.168.70.13 to match your machine's IP where the TFTP/FTP server is running.
      Change G8272_8.2.4.0_OS.man to match the file name.
    15. You'll be asked if you want to make image1 the default boot image; press y.
    16. Repeat the same step above for the 2nd image: image2. Do NOT select it as the default image.
    17. Now upload the Boot image:
      #copy tftp boot address 192.168.70.13 filename G8272-8.2.4.0_Boot.man mgt-port
    18. We're done. If you have any config unsaved, type: #write
    19. Now that you're done, reboot the switch: #reload

    Congratulations.

    Tip: You may want to change the switches' timezone, date & time (in that exact order). The defaults dated to Feb 2015.

    IBM POWER8 Networking via Direct Attach Cables

    I recently had a project where my company sold POWER8 servers to the customer along with some Lenovo servers and Lenovo G8272 network switches. The switches have 48x 1/10 Gb ports + 6x 40 Gb ports.

    To save on cost, it's possible to use Direct Attach Cables (DACs) to connect servers to the switches without buying SFPs nor FC cables. List price comparison:
    • Lenovo 10GBASE-SR SFP+ Transceiver (46C3447) = $629
    • Lenovo 5m LC-LC OM3 MMF Cable (00MN508) = $58
    • To connect 1 server (4 ports) to switches (4 ports) = 8x $629 + 4x $58 = $5,264.
    In contrast, with DACs, you only need 1 cable which includes the SFPs (copper):
    • Lenovo 5m Passive SFP+ DAC Cable (90Y9433) = $210
    • Lenovo 5m Active DAC SFP+ Cable (00VX117) =  $290
    • Active are often used for switch-to-switch connectivity.
    • To connect 1 server (4 ports) to switches (4 ports) = 4x  $210 = $840.
     DACs are 16% the cost! Or 6.3 times cheaper. These prices are based on publicly available list prices. They might be different depending on your region and distributor.

    The POWER8 servers (S822) have the following Ethernet adapter: EN0U -- PCIe2 4-Port (10Gb+1GBE) Copper SFP+RJ45 Adapters. According to the redbook (guide), these adapters require Active Copper DACs.

    I actually used the Passive DACs that I used for the Lenovo servers, and the cables worked just fine. The AIX team configured 2 Virtual Input/Output Servers (VIOS) on each POWER8 system, and each POWER8 system had 4 of these adapters. We also configured LACP for each VIOS, so the total bandwidth available to each VIOS was 40 Gb.

    So even though the redbook says that Active DACs are required, the passive ones work just fine. Also the redbook only lists 1 meter, 3 meter & 5 meter cables (since they're active) and no mention of passive cables.