Saturday, February 22, 2014

Introduction to the Z File System (ZFS) - Linux

An Introduction to the Z File System (ZFS) for Linux

clip_image001

ZFS is commonly used by data hoarders, NAS lovers, and other geeks who prefer to put their trust in a redundant storage system of their own rather than the cloud.  It’s a great file system to use for managing multiple disks of data and rivals some of the greatest RAID setups.

Photo by Kenny Louie.

What is ZFS and Why Should I Use it?

The Z file system is a free and open source logical volume manager built by Sun Microsystems for use in their Solaris operating system.  Some of its most appealing features include:

Endless scalability

Well, it’s not technically endless, but it’s a 128-bit file system that’s capable of managing zettabytes (one billion terabytes) of data.  No matter how much hard drive space you have, ZFS will be suitable for managing it.

Maximum integrity

Everything you do inside of ZFS uses a checksum to ensure file integrity.  You can rest assured that your files and their redundant copies will not encounter silent data corruption.  Also, while ZFS is busy quietly checking your data for integrity, it will do automatic repairs anytime it can.

Drive pooling

The creators of ZFS want you to think of it as being similar to the way your computer uses RAM.  When you need more memory in your computer, you put in another stick and you’re done.  Similarly with ZFS, when you need more hard drive space, you put in another hard drive and you’re done.  No need to spend time partitioning, formatting, initializing, or doing anything else to your disks – when you need a bigger storage “pool,” just add disks.

RAID

ZFS is capable of many different RAID levels, all while delivering performance that’s comparable to that of hardware RAID controllers.  This allows you to save money, make setup easier, and have access to superior RAID levels that ZFS has improved upon.

Installing ZFS

Since we’re only covering the basics in this guide, we’re not going to install ZFS as a root file system.  This section assumes that you’re using ext4 or some other file system and would like to use ZFS for some secondary hard drives.  Here are the commands for installing ZFS on some of the most popular Linux distributions.

Solaris and FreeBSD should already come with ZFS installed and ready to use.

Ubuntu:

$ sudo add-apt-repository ppa:zfs-native/stable
$ sudo apt-get update
$ sudo apt-get install ubuntu-zfs

Debian:

$ su -
# wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_2%7Ewheezy_all.deb
# dpkg -i zfsonlinux_2~wheezy_all.deb
# apt-get update
# apt-get install debian-zfs

RHEL / CentOS:

$ sudo yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release-1-3.el6.noarch.rpm
$ sudo yum install zfs

If you have some other distribution, check out zfsonlinux.org and click on your distribution under the “Packages” list for instructions on how to install ZFS.

As we continue with this guide, we’re going to use Ubuntu because that seems to be the #1 choice for Linux geeks.  You should still be able to follow along no matter what, as the ZFS commands won’t change across different distributions.

Installation takes quite a while, but once it’s finished, run $ sudo zfs list to make sure it’s installed correctly.  You should get an output like this:

clip_image002

We’re using a fresh installation of Ubuntu server right now, with only one hard drive.

clip_image003

Configuring ZFS

Now, let’s say we put six more hard drives into our computer.

$ sudo fdisk -l | grep Error will show us the six hard drives we just installed.  They’re currently unusable since they don’t contain any kind of partition table.

clip_image004

As we mentioned earlier, one of the nice things about ZFS is that we don’t need to bother with partitions (although you can if you want to).  Let’s start by taking three of our hard disks and putting them in a storage pool by running the following command:

$ sudo zpool create -f geek1 /dev/sdb /dev/sdc /dev/sdd

zpool create is the command used to create a new storage pool, -f overrides any errors that occur (such as if the disk(s) already have information on them), geek1 is the name of the storage pool, and /dev/sdb /dev/sdc /dev/sdd are the hard drives we put in the pool.

After you’ve created your pool, you should be able to see it with the df command or sudo zfs list:

clip_image005

As you can see, /geek1 has already been mounted and is ready to use.

If you want to see which three disks you selected for your pool, you can run sudo zpool status:

clip_image006

What we’ve done so far is create a 9 TB dynamic stripe pool (effectively, RAID 0).  In case you’re not familiar with what that means, imagine we created a 3 KB file on /geek1.  1 KB would automatically go to sdb, 1 KB to sdc, and 1 KB to sdd.  Then when we go to read the 3 KB file, each hard drive would present 1 KB to us, combining the speed of the three drives.  This makes writing and reading data fast, but also means we have a single point of failure.  If just one hard drive fails, we will lose our 3 KB file.

Assuming that protecting your data is more important than accessing it quickly, let’s take a look at other popular setups.  First, we’ll delete the zpool we’ve created so we can use these disks in a more redundant setup:

$ sudo zpool destroy geek1

Bam, our zpool is gone.  This time, let’s use our three disks to create a RAID-Z pool.  RAID-Z is basically an improved version of RAID 5, because it avoids the “write hole” by using copy-on-write.  RAID-Z requires a minimum of three hard drives, and is sort of a compromise between RAID 0 and RAID 1.  In a RAID-Z pool, you’ll still get the speed of block-level striping but will also have distributed parity.  If a single disk in your pool dies, simply replace that disk and ZFS will automatically rebuild the data based on parity information from the other disks.  To lose all of the information in your storage pool, two disks would have to die.  To make things even more redundant, you can use RAID 6 (RAID-Z2 in the case of ZFS) and have double parity.

To accomplish this, we can use the same zpool create command as before but specify raidzafter the name of the pool:

$ sudo zpool create -f geek1 raidz /dev/sdb /dev/sdc /dev/sdd

clip_image007

As you can see, df -h shows that our 9 TB pool has now been reduced to 6 TB, since 3 TB is being used to hold parity information.  With the zpool status command, we see that our pool is mostly the same as before, but is using RAID-Z now.

To show how easy it is to add more disks to our storage pool, let’s add the other three disks (another 9 TB) to our geek1 storage pool as another RAID-Z configuration:

$ sudo zpool add -f geek1 raidz /dev/sde /dev/sdf /dev/sdg

We end up with:

clip_image008

The Saga Continues…

We’ve barely scraped the surface of ZFS and its capabilities, but using what you’ve learned in this article you should now be able to create redundant storage pools of your data.  Check back with us for future articles about ZFS, see the man pages, and search around for the endless niche guides and Youtube videos covering ZFS functions.

Taken From: http://www.howtogeek.com/175159/an-introduction-to-the-z-file-system-zfs-for-linux/

1 comment: