Tuesday, September 8, 2009

Software RAID vs. LVM: Quick Speed Test

Table of Contents


Introduction

Currently, I have a fileserver that is setup this way:

Filesystem
      ^
Logical Volume Manager
      ^
Software RAID Arrays
      ^
Physical Disks

In my case, the LVM is an extra layer and it's not useful since I only have one physical entity that belongs to a Volume Group: A single RAID5 array.
So you could put your filesystem on top of a Logical Volume, or directly on the RAID array device. It depends on how you want to manage your data and devices.

So, is this hampering performance? The tables below will do the talking, but first: the setup.

System Setup

Processor
Intel Pentium Dual CPU E2160 @ 1.80GHz
MotherboardMSI (MS-7514) P43 Neo3-F

North Bridge: Intel P43

South Bridge: Intel ICH10
SATA Controller 1
JMicron 20360/20363 AHCI Controller

AHCI Mode: Enabled

Ports: 6-7
Sata Controller 2
82801JI (ICH10 Family) SATA AHCI Controller

Ports: 0-5
RAM
1GB @ CL 5
Video Card
GeForce 7300 GS
Disk sda
WDC WD10EACS-00D6B1
Disk sdb
WDC WD10EACS-00D6B1
Disk sdc
WDC WD10EACS-00ZJB0
Disk sdd
WDC WD10EADS-65L5B1
Disk sde
WDC WD10EADS-65L5B1
Disk sdf
MAXTOR STM31000340AS
Disk sdg
WDC WD10EACS-00ZJB0
Disk sdh
WDC WD10EADS-00L5B1
Disk sdi
Hitachi HDS721680PLAT80 (OS)
Chunk size
256kB
LVM: Physical Extent Size
1GB
LVM: Read ahead sectors
Auto (set to 256)

Speed Test Methods

A quick and easy way to run a speed test is by using a tool called hdparm and another called dd.
Note that these two utilities don't take the filesystem performance into account, as they read directly from the device, not a certain file. It doesn't matter in this case, as I'm about to show comparisons to show the magnitude of difference speed only, not show very exact results ;)

hdparm

hdparm -tT /dev/xxx
-t: Perform timings of device reads for benchmark and comparison purposes.
Displays  the  speed of reading through the buffer cache to the disk without any prior caching of data.
This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead.

-T: Perform timings of cache reads for benchmark and comparison purposes.
This displays the speed of reading directly from the Linux buffer cache without disk access.
This measurement is essentially an indication of the throughput of the processor, cache, and memory of the system under test.

dd

dd if=/dev/xxx of=/dev/null bs=10M count=400
This will read from the device and dump the data to a null device (just reading). Block size=10 Megabytes (2^20).
This will read 4GB of data. I specified 4GB to make sure that it surpasses the RAM size.

Before running dd, I flushed the read cache by entering: hdparm -f /dev/sd[a-h], which flushes the cache of all RAID disks.

Speed Test #1: RAID vs. LVM

LVM
root@Adam:~/mdadm-3.0# dd if=/dev/mapper/arrays-storage of=/dev/null bs=10M count=400
2097152000 bytes (2.1 GB) copied, 41.1147 s, 43.0 MB/s


root@Adam:~/mdadm-3.0# hdparm -tT /dev/mapper/arrays-storage
 Timing cached reads:   1926 MB in  2.00 seconds = 962.65 MB/sec
 Timing buffered disk reads:  146 MB in  3.00 seconds =  48.62 MB/sec
 
 
RAID
root@Adam:~/mdadm-3.0# dd if=/dev/md0 of=/dev/null bs=10M count=400
2097152000 bytes (2.1 GB) copied, 10.9341 s, 125 MB/s


root@Adam:~/mdadm-3.0# hdparm -tT /dev/md0
Timing cached reads:   1998 MB in  2.00 seconds = 998.73 MB/sec
Timing buffered disk reads:  538 MB in  3.01 seconds = 178.98 MB/sec

The above numbers are the average of 3 runs.

Speed Test #2: Disks Separately

root@Adam:~# for i in {a,b,c,d,e,f,g,h}; do dd if=/dev/sd"$i"1 of=/dev/null bs=10M count=400; done
root@Adam:~# for i in {a,b,c,d,e,f,g,h}; do hdparm -I /dev/sd"$i" | grep Firmware; done

Disk
Model
Firmware
Speed Test Result
sda
WDC WD10EACS-00D6B101.01A0146.3106 s, 90.6 MB/s
sdb
WDC WD10EACS-00D6B101.01A0148.6391 s, 86.2 MB/s
sdc
WDC WD10EACS-00ZJB001.01B0170.8184 s, 59.2 MB/s
sdd
WDC WD10EADS-65L5B101.01A0146.9733 s, 89.3 MB/s
sde
WDC WD10EADS-65L5B101.01A0144.2861 s, 94.7 MB/s
sdf
MAXTOR STM31000340ASMX15
77.1797 s, 54.3 MB/s
sdg
WDC WD10EACS-00ZJB001.01B0150.5498 s, 83.0 MB/s
sdh
WDC WD10EADS-00L5B101.01A0146.747 s, 89.7 MB/s

As you can see, though sdc & sdg have the same model and firmware, their speed differs! I have no clue why and I searched in Western Digital's website for firmwares to download, but their site leads no where to any firmware download link.

The Maxtor disk has a newer firmware released. I'll checkout its changelog before installing it. Also, as a precaution, I'll clone the Maxtor disk to sdg since it's not being used now; just in case the new firmware doesn't play nice!

Conclusion

From the above numbers, it's clear that LVM, in my setup, has crippled the performance by a huge margin (~66%). So for my next setup, I'm going to skip LVM and slap the filesystem directly on top of the RAID5 array.

On one of my PCs (Adrenalin), I already have XFS filesystem running on top of the RAID array and LVM is not being used. I get double the speed of hard disks out of the array (140 MB/s) when tested it last year with hdparm.

I don't claim that this is a typical problem of LVM. I did a quick search and didn't find numbers. I'm too lazy right now to find anything really. But I have the numbers on that MSI crap board (caused me so many problems with the SATA ports), and I'll skip LVM on that board. If I keep the board & not smash it to smithereens.

Irrelevant note: I'm loving posting to my blog through Google Docs.

Tuesday, September 1, 2009

Cheap Man's 40-Disk Storage Cluster


Table of Contents

Introduction

This is a computer case design that fits 40 disks, 4 motherboards, 5 power supplies, a bunch of fans and a gigabit switch.

The main goal of this design is to use the cheapest parts with the least effort to assemble everything. So you could say this is also Poor/Lazy Man's Storage Cluster!
The area used is 60x60x50 (WxDxH) cm only.

Keep in mind that this is a case not a whole system. I've only factored the price of the pieces used to put the case together.

Parts and Prices

Metal table with 2 net-like surfaces7 KDIKEA
2x Wooden CD rack that fits 35 CDs2x3 KDIKEA
Plastic drawer mat1.75 KDIKEA
Rubber grooved floor mat (3mm thick)1.5 KD for 0.5 meterTrueValue
Nylon Cable Ties (203x3.2mm)0.5 KDFamily Hardware Store
Total16.75 KD

Tools

  • Hands
  • Foot
  • Long nose pliers
  • Scissors
  • Hammer (or anything that hammers)

Design Diagram


In this diagram you see the measurements of each component and how they fit. When we put the disks inside the rack, there was an empty space of 4mm; we took care of that by using the rubber mat, which is 3mm thick on each side, totaling to 6mm, which helps holding the disks and serve as a shock absorber.

The disk rack is made of wood and we have rubber mats inside, so you'd expect for heat to be trapped. Our work around is this:
  • Choosing a rubber mat with grooves
  • Inserting the disks heads down having the 2.5cm edges touching the rubber
  • Leaving space between disks
  • Pushing the disks down till touching the metal table
  • Installing fans on the lower part of the table, blowing at the disks

Note that I have used Layout #1. Layout #2 was too cluttered and I didn't really think it through properly so I don't know if it's even possible. If you are able to squeeze more than 4 motherboards in that same table (or same dimensions), let me know!

Assembly

  1. Assemble lower shelf of table
  2. Use foot to break back-panel of CD racks
  3. Hammer the metal pins of the CD racks inwards
  4. Point the side that has the metal pins towards table surface (keeps wood fragments away from you)
  5. Tie the rack to the table using the cable ties
    Note: We assembled the upper surface but worked on the lower one later and kept the upper free for future motherboards.
  6. Cut 11 lines (in a group) of the rubber mat
  7. Attach the rubber mat to a side and tie it down. Do the same for the other side
    We have cut out the extra edges shown in the first picture to reduce heat contraption.
  8. If you want to have a separate power supply unit (PSU) for fans, attach it to the bottom of the lower surface
    Note: I'll tell you later how to turn on the PSU without a motherboard (jump-starting).If you're going to run anything at the bottom, now is the time to attach them. You won't be able to do it later on!
  9. Put the power supplies in place and tie them (make sure it's touching the metal table)
  10. Cut plastic mat to fit rest of table area and tie it to the table. Use the nail to punch holes
  11. Punch extra holes for the motherboards and don't tie the motherboards too hard!
  12. Tie fans below the disk rack(s) and connect them in serial to a PSU
    Note: We made a mistake above and tied the fans in the opposite direction of the motherboard and were too lazy to reposition them.
  13. Slap in the hard disk drives (HDDs) and hook them to the motherboard(s)
  14. Powering on
    To power on a motherboard, you could either use a power switch (or make one), or enable Wake-up On LAN (WOL) from the BIOS (assuming your motherboard supports it. You'd need to know the MAC address of your LAN port.
    We enabled WOL but it didn't work for some reason. Crappy MSI board.
  15. If your motherboard doesn't have a built-in video/graphics card, you'll need to bend the tip of the graphics card you're about to attach
  16. Almost done. Attach the upper surface and make sure that the rack is facing the opposite side to the one on the lower surface, so that the fans don't hit the cables

    Make sure to double check on the HDD cables after attaching the upper surface.
  17. Jump-starting a PSU
    I stripped a cable wrapper that had a metal piece inside it and stuck it in the proper pins. This way, I control the fan PSU using the ON/OFF switch at the back; no need for a separate power switch. Unfortunately we didn't take pictures of that, but here are some references:

To Cluster Or Not To Cluster

Now that you've had your motherboards all hooked up most likely to a gigabit switch, there are different ways to use all this storage capacity:
  1. Make them appear as a single storage unit
    This can be done through the use of iSCSI. It allows you to expose either each single hard disk or a whole RAID array as a single storage device to another machine over Ethernet. This way you can combine all the disks/arrays under one machine and create a LV (Logical Volume not Louis Vuitton) then the filesystem on top of the LV.
    I don't know how to do this on Windows. I can help you do this on Linux though. If you did this on Windows, drop me an email and I'll link your page.
  2. Use them separately
    Well this is a no-brainer: Just assign a different IP for each machine and expose each storage through Samba (on Linux) or share the directories on Windows.
  3. A mix between the above two
    Using iSCSI puts a lot of risk on the data because if one motherboard, or multiple disks fail (in case of using RAID5) then you lose all your data. For good. And since we have such a good history of increasing and managing the storage smoothly (NOT!), we decided to not use iSCSI. Maybe if we had a better history, we'd gone with it.

    What we're going to do is keep the existing Samba share and move the Anime directory (2.4TB) to another machine. We then mount the other machine using NFS over the existing Anime directory. Mounting a directory over another is called shadowing.
    For this to work properly, you need to create the usernames on all systems with the same IDs, otherwise you'll have a heck of a time with permissions.

    Now, the users still access the same old single IP and can still access all data, though distributed cross systems. If a machine's disks died, at least we won't lose all the data.

    We don't yet have the 2nd motherboard, so I'll write about this in detail when we get it and do the setup.

Post Assembly

After assembling and running the machine, it's been put under heavy load and these are the temperature readings:
/dev/sda: WDC WD10EACS-00D6B1: 27°C
/dev/sdb: WDC WD10EACS-00D6B1: 29°C
/dev/sdc: WDC WD10EACS-00ZJB0: 29°C
/dev/sdd: WDC WD10EADS-65L5B1: 28°C
/dev/sde: WDC WD10EADS-65L5B1: 29°C
/dev/sdf: MAXTOR STM31000340AS: 29°C
/dev/sdg: WDC WD10EACS-00ZJB0: 27°C
/dev/sdh: WDC WD10EADS-00L5B1: 26°C

This is way much better than before! They used to be in the 40's range!!

Last Words

Our baby is running fine now, and for the first time we haven't faced problems, thank God!

If you have any questions or comments, let us know. I suggest you subscribe via email when commenting, or leave a blank comment to just subscribe to stay posted on updates when adding the 2nd motherboard.

Good luck and don't blame us if you get electrocuted ^_^'