Wednesday, May 20, 2009

DB2 Container Rebalancing: Choosing the right filesystems

Our ERP software's database spans over multiple filesystems for better load balancing, but mixing high-load and low-load database tables on the same filesystem. Also, filesystem are grouped in volume groups that reside on different physical disks.

To better understand this, here's how our production SAN is devised:

Array 2 is one of two database data arrays, and is created over 6 hard disks in RAID level 5.
Array 2 contains two Logical Drives: SAPappShared and SAPdataDrv2. These will be mapped to the operating system (OS) as physical disks.

 




Here you can see that the 3rd array spans over different disks. This is to optimize data load times when requesting data from the database.
The two red/white disks are hot-spares that will substitute any failed disk immediately.
You can also see that this array reside on a different controller (B), while the previous is on (A). The controller handles I/O requests from the OS.

 




Database logs are written to a RAID level 1 array spanned over 2 disks. Logs are written always so it's better to keep them away from data to not hamper the performance.






From what you saw above, you can count 6 Logical Drives that relate to the database. The database uses only 4 of them: Log, Data1-1, Data1-2 and Data2. The Heartbeat drives are used by the cluster services of the OS.



Now, moving to the OS: IBM's AIX. We run version 5.3L.

On AIX, you cannot create a filesystem on a physical disk directly. The physical disk has to belong to a Volume Group. A volume group can house multiple physical disks, whether internal disks or attached through storage or network. A volume group can contain multiple filesystems. This all falls under the topic of the Logical Voume Manager, which I will not talk about here.

Back to our setup above, from AIX's point of view, the Logical Drives are shown as physical disks:
# lspv
hdisk0 00c3b3f0feb38826 rootvg active
hdisk1 00c3b3f0d832abbc rootvg active
hdisk2 00c3b3e0fbad39e6 hrtvg1
hdisk3 00c3b3e0fbb076c8 saplogsvg active
hdisk4 00c3b3e0fbef90aa sapdatavg1 active
hdisk5 00c3b3e0fbf2e9bf sapdatavg2 active
hdisk6 00c3b3f0168bedbb sapdatavg3 active

You can see from the list above that each Logical Drive is a physical disk and moreover, each has been put to a separate volume group.

These are filesystems spanned over the volume group on hdisk4:
# lspv -l hdisk4
hdisk4:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
db2lv 2 2 00..02..00..00..00 /db2/EHP/db2ehp
db2dumplv 24 24 00..01..00..00..23 /db2/EHP/db2dump
sapdata6lv 280 280 123..00..00..135..22 /db2/EHP/sapdata6
sapdata3lv 340 340 13..132..135..00..60 /db2/EHP/sapdata3
loglv2 1 1 00..01..00..00..00 N/A

Before continuing, let's talk about database usage, to be able to justify the next moves.

Since the database tables have been split over multiple filesystems, some of the filesystems have high growth rate. Our database (IBM's DB2) can span the contents of a table over multiple filesystems; a load balancing technique. It does this through its concept of rebalancing containers. (In DB2, tables reside in Tablespaces and a tablespace consists of one or more containers).

So, if we create a second container of a high growth table, it will take half of it and dump it over the new container and in the future, it will write to both in a load balancing form.

Now, to be able to achieve a true sense of load balancing, in filesystem utilization and database performance, ideally these two filesystems (that house the high growth table) should be on different controllers, so that data requested residing on both filesystems can be served in parallel by the controllers.

If you're doing the rebalancing after going live with the system (which is the case mostly), then you'd need to know which physical disk on the OS goes to which controller.

Here, you see that each Logical Drive has a unique identifier called Logical Unit Number (LUN).

The LUN is represented as a hex value on AIX.





From the shell, type "lspv" to list the physical volumes, then "lsattr -El " to list the attributes of one of the disks. The value "lun_id" is what you want. See the output below.

# lspv
hdisk0 00c3b3f0feb38826 rootvg active
hdisk1 00c3b3f0d832abbc rootvg active
hdisk2 00c3b3e0fbad39e6 hrtvg1
hdisk3 00c3b3e0fbb076c8 saplogsvg active
hdisk4 00c3b3e0fbef90aa sapdatavg1 active
hdisk5 00c3b3e0fbf2e9bf sapdatavg2 active
hdisk6 00c3b3f0168bedbb sapdatavg3 active

# lsattr -El hdisk4
PR_key_value none Persistant Reserve Key Value True
cache_method fast_write Write Caching method False
ieee_volname 600A0B80003260360000A92E472997CA IEEE Unique volume name False
lun_id 0x0005000000000000 Logical Unit Number False
max_transfer 0x100000 Maximum TRANSFER Size True
prefetch_mult 1 Multiple of blocks to prefetch on read False
pvid 00c3b3e0fbef90aa0000000000000000 Physical volume identifier False
q_type simple Queuing Type False
queue_depth 10 Queue Depth True
raid_level 5 RAID Level False
reassign_to 120 Reassign Timeout value True
reserve_policy single_path Reserve Policy True
rw_timeout 30 Read/Write Timeout value True
scsi_id 0x10400 SCSI ID False
size 173670 Size in Mbytes False
write_cache yes Write Caching enabled False

# lsattr -El hdisk6
PR_key_value none Persistant Reserve Key Value True
cache_method fast_write Write Caching method False
ieee_volname 600A0B80003260360000A9AA472EA49C IEEE Unique volume name False
lun_id 0x000c000000000000 Logical Unit Number False
max_transfer 0x100000 Maximum TRANSFER Size True
prefetch_mult 1 Multiple of blocks to prefetch on read False
pvid 00c3b3f0168bedbb0000000000000000 Physical volume identifier False
q_type simple Queuing Type False
queue_depth 10 Queue Depth True
raid_level 5 RAID Level False
reassign_to 120 Reassign Timeout value True
reserve_policy single_path Reserve Policy True
rw_timeout 30 Read/Write Timeout value True
scsi_id 0x10500 SCSI ID False
size 294225 Size in Mbytes False
write_cache yes Write Caching enabled False

Using the above commands, finding which disk belongs to which controller on a SAN on AIX becomes like baby steps ;)

We did a rebalance on one of the containers today. The container's size was about 50GB and it took about an hour on an IBM DS4800 SAN hooked with fiber to the database node residing on an IBM pSeries (570) machine: 4 POWER5 processors allocated and 30GB of RAM.

I tried looking up time estimation for rebalancing containers on DB2 on Google, but didn't find anything useful. Luckily, things went on smoothly.
The error file "db2diag.log" had the following with regards to rebalancing:

2009-05-18-01.05.59.552039+180 E106271503A342     LEVEL: Warning
PID : 799110 TID : 1 PROC : db2rebal 0
INSTANCE: db2ehp NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlb_rebalance, probe:2876
MESSAGE : ADM6062I Rebalance for table space "EHP#BTABD" (ID "17") has been
completed.

2009-05-18-01.05.59.552281+180 I106271846A295 LEVEL: Warning
PID : 799110 TID : 1 PROC : db2rebal 0
INSTANCE: db2ehp NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlb_rebalance, probe:2876
MESSAGE : PoolID 17: Last extent moved was #1866949

During the rebalancing period, the container being worked on was not showing in the list on the ERP software (SAP) and threw some errors. It was working fine on the database level however. After the process completed, SAP reported the new containers properly.

2 comments:

Bloggylife said...

this poor guy tried to explain to me the concept of zoning and I had this look on my face HUH WTH ?? Again ..

I couldn't get it ...

SAN storage, partitioning, creating&attaching LUNs, increase space, RAIDs, that about what I know .. it's a BIG thing and it gives me a headache to understand it and still I learn something new when I have to deal with it ...

So you're still using storage manager 9, 10 is out, our support sez we need to upgrade the firmware which may affect servers attached to SAN, I need to look more into this ...

MBH said...

BloggyLife,
I saw the firmware update last year and dodged it; it started manifesting many problems for many companies.

Here's the enterprise rule: If it works, don't update it.

Firmwares are there to solve problems; if you have none, you don't need the firmware.

Storage Manager 10 is required only for newer firmwares.