HP Microserver ZFS benchmark with dedup

I have a HP MicroServer which I’m very satisfied with. I bought this right away when HP announced it when it still didn’t have a retail price determined. The only downside of it that it has only soft-RAID (or fake-RAID) controller built-in. A proper HP Smart Array controller must have boosted the price enough to cause HP sell significantly less from this great entry level server. I was a bit disappointed when I first put the new 2TB drives in the HP configured the fake-RAID to RAID 0 and linux still saw it as two individual disk without the dm package.

Anyway recently I decided to use a more sophisticated setup in my box so I was experimenting ZFS. I have to tell I love it. It’s very versatile, flexible and easy to manage. My current setup is:

[root@server storage]# zpool status
  pool: storage
 state: ONLINE
 scan: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        storage               ONLINE       0     0     0
          mirror-0            ONLINE       0     0     0
            sda               ONLINE       0     0     0
            sdb               ONLINE       0     0     0
        logs
          sysvm-zil           ONLINE       0     0     0
        cache
          sysvm-storageCache  ONLINE       0     0     0

[root@server storage]# zpool status

pool: storage

state: ONLINE

scan: none requested

config:

NAME STATE READ WRITE CKSUM

storage ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

sda ONLINE 0 0 0

sdb ONLINE 0 0 0

logs

sysvm-zil ONLINE 0 0 0

cache

sysvm-storageCache ONLINE 0 0 0

Where sda and sdb are the 2TB disks. Sysvm is an LVM volume group contains one 120G Samsung SSD. There are two LV created for the zpool ZIL (ZFS Intent Log) 2G and the cache 32G.

Note: The system was installed on the SSD on different LVs.

Dedup

So I thought I’m already using an enterprise storage filesystem why shouldn’t I experiment with the given features. I turned deduplication on. The hit on performance was significant as it was expected but I thought the benefits will make up the downsides. After running the pool for a month with almost 1TB data of mostly photos and videos the dedup ratio was still 1.03. 3% storage space was spared. which is in this case 30G. Not much. Here I should emphasize one usually overlooked factor:

Storage space is cheap vs. storage performance is very expensive

So see how it performs with and without dedup:

[root@server storage]# dd if=/dev/zero of=/storage/benchmark.out bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 4.88723 s, 21.5 MB/s

[root@server storage]# zfs set dedup=off storage

[root@server storage]# dd if=/dev/zero of=/storage/benchmark.out.2 bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.141401 s, 742 MB/s

[root@server storage]# dd if=/dev/zero of=/storage/benchmark.out bs=1M count=100

100+0 records in

100+0 records out

104857600 bytes (105 MB) copied, 4.88723 s, 21.5 MB/s

[root@server storage]# zfs set dedup=off storage

[root@server storage]# dd if=/dev/zero of=/storage/benchmark.out.2 bs=1M count=100

100+0 records in

100+0 records out

104857600 bytes (105 MB) copied, 0.141401 s, 742 MB/s

21.5 MB/s against 742 MB/s. 34 times faster without dedup.

Why?

If dedup is turned on the written data has to be read and the block checksums has to be checked in the dedup table. Even if the dedup table fits in memory the hashing needs time. It’s a big overhead.

Comparing to the “plain” write we have the benefits from the different caching systems (if we turn on write cache on hardware level even more). The SSD kicks in and helps a lot here.

So is it worth to use dedup at all?

Yes, of course. As always it depends (on the work patterns).

I use the storage for storing my photos on it sharing via Samba with my desktop box where I work with them. It’s really annoying when writing down a 15MB Nikon RAW file takes seconds. I need throughput and I need IOPS. In this case dedup doens’t make much sense.

However using the storage for backups where usually dedup gives >2x ratio it’s absolutely reasonable.

Hybrid solution: ZFS won’t de-dedup your data if your turn it off. It stays deduplicated. So a possibly solution I found here is to turn off dedup when write load is heavy and turn it on to do an offline deduplication by copying data inplace. Of course this needs some spare space to be able to do so.

Conslusion

I really love ZFS. I just ordered two other 2TB disks so I can fill the slots in the microserver (the SSD is up in the 5.25″ slot). I’m going to leave the dedup off as I need the performance and 30GB is not enough reason to suffer.

You might like these too

PERL / Catalyst test and benchmark Initializing First of all I have created the environment for the test. On the previously cloned machine the database has been imported and started. T...

Solr benchmark – first blood This is a quick impression about the freshly installed Solr 3.5 server. Enviroment The base system is a Amazon Microinstance equivalent virtual mach...

Framework comparison summary To be able to size up an objective summing image I started to build an e-commerce website in all frameworks on the same model. These are my experience...

HP Microserver ZFS benchmark with dedup

Dedup

Why?

So is it worth to use dedup at all?

Conslusion

You might like these too

You may also like...

Categories

Links

Recent comments

Tweets

HP Microserver ZFS benchmark with dedup

Dedup

Why?

So is it worth to use dedup at all?

Conslusion

You might like these too

You may also like...

Hagakure

Difference between DISTINCT and GROUP BY

Framework comparison (PHP, PERL, Python)

Categories

Links

Recent comments

Tweets