Sunday, January 4, 2009

zfs sparse provisioning with compression

As I was writing my last post, the possibility of zfs reclaiming/saving some space within the zvol by using compression kept nagging at me. In this post I've explored what happens when compression is used with a sparse-vmfs-iscsi-zvolume

I'm not so much interested in whether the real,used VM data blocks would compress well (I don't really expect there is much to be made in the way of gains here) but I did want to see if there were circumstances where compression would allow some unused or freed blocks above zfs to be reclaimed to the zpool by way of compressing empty blocks to almost nothing.

Testing

I performed the following tests to see what would happen:
Virtual machine FS blocks - Write 10GB of data to the file system in a VM, and then delete it again. Check zvol usage before/after.
VMFS blocks - Delete a the 10GB virtual disk from the iscsi VMFS lun. Check zvol usage before/after.

I added a 10GB disk to a winXP VM, formatted it with NTFS and then ran an iometer benchmark on the volume. I did have an interesting result while iometer was creating it's initial 10GB scratch file on the disk. Iometer must write zeros or something very compressable while populating the file because I was getting fantastic compression. Iometer finished creating it's 10GB file, but ZFS was reporting only 78MB used on the whole zvol. That's over 100x compression/deduplication!

With the scratch file fully created, the benchmark actually started and at this point iometer must have been writing some real data across the disk, as the zfs usage worked it's way back up to the full 10GB disk usage that I would have expected.

After the benchmark had finished the zvol looked like this:
#zfs get used,compressratio,referenced Z/esxiscsi-compressed
NAME PROPERTY VALUE SOURCE
Z/esxiscsi-compressed used 10.9G -
Z/esxiscsi-compressed compressratio 1.00x -
Z/esxiscsi-compressed referenced 10.9G - 

I closed the benchmark, and deleted the 10Gb iobw.tst file and rebooted the VM to ensure that all NTFS/VM caches were flushed.
As I expected, the zvol used space remained at the full 10.9G despite the virtual disk being 'empty'.
The NTFS delete operation won't have zeroed the blocks, instead it will have simply updated it's FS pointers/metadata to reflect that the blocks previously occupied by iobw.tst are now 'free'. There is no way for zfs to know that suddenly that these blocks are unreferenced so it's still faithfully storing them unaware of what's happening up the stack at the NTFS level. It was worth a shot!

Next I deleted the virtual disk from the VM using the VIC. Will vmfs free/zero the blocks??
No, it didn't and while I don't know as much about VMFS semantics I expect this result is a result of the same FS level operations/shortcuts as the NTFS test, but with VMFS this time. The zvol usage at this point was still 10.9GB.

That's 0/2 for my compressability tests.

Reusing blocks?

Next I wondered what would happen if I added and fill another 10GB disk on the now empty VMFS iscsi volume.
If VMFS were to reuse the same blocks that the previous 10GB virtual disk had used, then the vzol will simply update the contents of those blocks again, using no further disk space.

To my surprise this is exactly what happened. After creating and filling another 10GB disk and performing the same iometer procedure the zvol was still only using 10.9GB.
In actual fact it went from 10.9 down to around 100MB before climbing back to 10.9 again due to the iometer scratch file/zeroing behaviour that we observed earlier.


We now know that if guests write zeros blocks on their file systems, zfs compression will reclaim the blocks - though I should point out that this would be pretty unusual in a real environment!


VMFS block allocation policies

This most recent result made me even more curious about how VMFS allocates blocks on disks/LUNs. If the block allocater works in a very simple first to last LBA fashion with a preference to reuse blocks at the start of a volume then there may be some space savings to be realised if you churn VMs a lot as I will in my test setup here.
Modern filesystems have fragmentation and wear levelling considerations to factor in, which typically results in blocks being allocated all over the volume but since vmfs typically deals with few, but very large files rather than the millions of small files that are typically created by any modern OS perhaps the engineers at VMware decided to fill their filesystem across a disk/LUN in much the same fashion as filling a glass from bottom to top. As files are deleted, all new ones are created at beginning again!

To test this I started afresh with an empty VMFS volume. I created a 10GB virtual disk, and filled it up. I then added a second 15GB disk and filled this too.
Here is the ZFS view of that:

root@supernova WinXP-ESX]#zfs get used,compressratio,referenced Z/esxiscsi-compressed Z/esxiscsi
NAME PROPERTY VALUE SOURCE
Z/esxiscsi used 27.3G -
Z/esxiscsi compressratio 1.00x -
Z/esxiscsi referenced 27.3G -
Z/esxiscsi-compressed used 27.2G -
Z/esxiscsi-compressed compressratio 1.00x -
Z/esxiscsi-compressed referenced 27.2G - 

Next I deleted both the 10, and 15GB disks, and created a new 20GB disk.
I then filled the 20GB disk, and checked the zfs usage again.

Sure enough I was still using the previously allocated 25Gb (27.3G reported by zfs).

I could go on to test fragmentation and overlap, but I've learned all I wanted to find out today.

Take aways
Zvol sparse provisioning + iscsi +VMFS works out to be a very efficient and scalable storage system. The total disk usage will only be as much as your highest utilisation point, and with sparse provisioning you don't have to worry about extending your vmfs volume each time you need to add another VM.

I should also point out that I've been taking a worst case scenario of a completed full disk that has had every single block written to. In the real world free space within the VMs isn't initially going to take up space on the zvol until the OS's FS allocates those blocks with data. This will of course happen over time but in the short term your disk savings will be even better than what I've tested here.

ZFS compression doesn't appear to have any obvious benefits in terms of storage savings, but i'll have a play with some real world VMs that don't contain synthetic data and I'll post an update in the future with my compression ratio results.

No comments: