HP Microserver ZFS benchmark with dedup

I have a HP MicroServer which I’m very satisfied with. I bought this right away when HP announced it when it still didn’t have a retail price determined. The only downside of it that it has only soft-RAID (or fake-RAID) controller built-in. A proper HP Smart Array controller must have boosted the price enough to cause HP sell significantly less from this great entry level server. I was a bit disappointed when I first put the new 2TB drives in the HP configured the fake-RAID to RAID 0 and linux still saw it as two individual disk without the dm package.

Anyway recently I decided to use a more sophisticated setup in my box so I was experimenting ZFS. I have to tell I love it. It’s very versatile, flexible and easy to manage. My current setup is:

Where sda and sdb are the 2TB disks. Sysvm is an LVM volume group contains one 120G Samsung SSD. There are two LV created for the zpool ZIL (ZFS Intent Log) 2G and the cache 32G.

Note: The system was installed on the SSD on different LVs.

Dedup

So I thought I’m already using an enterprise storage filesystem why shouldn’t I experiment with the given features. I turned deduplication on. The hit on performance was significant as it was expected but I thought the benefits will make up the downsides. After running the pool for a month with almost 1TB data of mostly photos and videos the dedup ratio was still 1.03. 3% storage space was spared. which is in this case 30G. Not much. Here I should emphasize one usually overlooked factor:

Storage space is cheap vs. storage performance is very expensive

So see how it performs with and without dedup:

21.5 MB/s against 742 MB/s. 34 times faster without dedup.

Why?

If dedup is turned on the written data has to be read and the block checksums has to be checked in the dedup table. Even if the dedup table fits in memory the hashing needs time. It’s a big overhead.

Comparing to the “plain” write we have the benefits from the different caching systems (if we turn on write cache on hardware level even more). The SSD kicks in and helps a lot here.

So is it worth to use dedup at all?

Yes, of course. As always it depends (on the work patterns).

I use the storage for storing my photos on it sharing via Samba with my desktop box where I work with them. It’s really annoying when writing down a 15MB Nikon RAW file takes seconds. I need throughput and I need IOPS. In this case dedup doens’t make much sense.

However using the storage for backups where usually dedup gives >2x ratio it’s absolutely reasonable.

Hybrid solution: ZFS won’t de-dedup your data if your turn it off. It stays deduplicated. So a possibly solution I found here is to turn off dedup when write load is heavy and turn it on to do an offline deduplication by copying data inplace. Of course this needs some spare space to be able to do so.

Conslusion

I really love ZFS. I just ordered two other 2TB disks so I can fill the slots in the microserver (the SSD is up in the 5.25″ slot). I’m going to leave the dedup off as I need the performance and 30GB is not enough reason to suffer.

You might like these too

Framework comparison summary To be able to size up an objective summing image I started to build an e-commerce website in all frameworks on the same model. These are my experience...
PERL / Catalyst test and benchmark Initializing First of all I have created the environment for the test. On the previously cloned machine the database has been imported and started. T...
Solr benchmark – first blood This is a quick impression about the freshly installed Solr 3.5 server. Enviroment The base system is a Amazon Microinstance equivalent virtual mach...

About charlesnagy

I'm out of many things mostly automation expert, database specialist, system engineer and software architect with passion towards data, searching it, analyze it, learn from it. I learn by experimenting and this blog is a result of these experiments and some other random thought I have time to time.
Bookmark the permalink.