ZFS Maintenance

By Christopher Stone
Published Dec 10, 2010
CC By-SA Licensed
ZFS's regular everyday tasks are really quite easy. Simple and straightforward administration was a key goal in developing the file system and the tools used to maintain it. To that end there are just two utilities that you will need for almost any task.

ZFS is organized into two logical administrative units: the storage pool where all the data is kept and the file system that keeps track of file locations, permissions, and other meta-data. On Solaris systems the file system also keeps track of sharing and a few additional features; since I'm running FreeBSD, those do not apply (at the time of writing).

This is more than just a reference page. It contains quite a bit of commentary and best practices. If you have questions or suggestions, please do contact me.
Many of the examples below show the tank zpool I created in the ZFS - Brief Intro article.
Some of the details have been modified for simplicity and security.

The zpool utility

The zfs utility

Commands related to properties, snapshots, and clones have been purposely omitted as they cover complex topics and will be the focus of upcoming articles.

zpool

The zpool utility manages the storage pool itself. It is not directly aware of the file systems you create in the pool, so all pool arguments take the pool name only (usually just tank or pool0). The following commands are the most common issued to the zpool utility. Typically these are only used when creating a storage pool, fixing something that has gone wrong, or during routine maintenance.

list

Show a list of all the zpools the system is aware of. ZFS automatically looks for certain markers on all drives connected to the system, so you never have to modify a configuration file (like fstab). Even if you plugin a foreign disk (such as a USB drive with ZFS from another computer), it should appear in this list, though foreign disks are not auto mounted.

Example:
root@aislynn# zpool list
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
tank   300G    30G   270G    10%  ONLINE  -

status

The list command shows some brief information regarding each zpool in the system, but for more details you need to issue the status command. It takes one argument, the name of the zpool you want the status of (if you omit this argument, you'll get the statuses of all zpools). Status returns the operational status of the storage pool, including all of the vdevs and backing devices of they apply. The scrub and error statuses are also included.

Example:
root@aislynn# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad5     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors

history

The history command simply shows what zpool commands have been issued against the pool named in the argument.

History of the pool from the attach/detach section:
root@aislynn# zpool history tank1
History for 'tank1':
2010-12-09.19:04:03 zpool create tank1 /dev/ad1
2010-12-09.19:06:32 zpool attach tank1 /dev/ad1 /dev/ad2
2010-12-09.19:06:49 zpool detach tank1 /dev/ad2

scrub

Perhaps one of the most important regular maintenance operations that should be performed on a zpool is the scrub operation. A scrub reads all the data from the drive and writes it to another location in the storage pool. This operation will detect unreadable sectors and consistency problems, run regularly it will ensure you are aware of problems when they develop.

Syntax: zpool scrub [-s] [pool_name]

The -s option stops a scrub already in process on a pool.

Example:
root@aislynn# zpool scrub tank
When the command is run, there is no information returned (unless an error occurs), the scrub is simply started. To see the status of the scrub, run the status command. During the scrub operation the progress (as a percent) and time to completion will be displayed. After the scrub process the results and time of completion will be displayed. During the scrub process, ZFS will utilize all "spare" disk throughput. In practice this results in notable disk performance degradation; you should normally do this during off-hours so users do not notice.

Scheduling a periodic disk scrub is easy using cron, just open /etc/crontab and add an entry something like the following:
0	2	*	*	0	root	zpool scrub tank
This schedules a scrub of zpool tank at 2am every Sunday morning. Whatever time you choose, be sure the scrub will not be interfering with another task running at the same time.
Alternately, you can add a scrub task to the periodic system by creating the file /usr/local/etc/periodic/weekly/404.zfs-scrub and putting the following in it:
#!/bin/sh -
zpool scrub tank
This will add a task to the periodic schedule (weekly in this example) that runs the same pool scrub command from above.

replace

The replace command is most commonly used when you have an array without a hot spare. If a drive fails in the array, you can issue this command telling zpool to use a different drive as a replacement. If the replacement drive is going back on the same connection, so it will show up under the same device name, then you omit the new device argument of the command.

Syntax: zpool replace [pool_name] [old_vdev] [new_vdev]

Example replacing a failed disk with a new disk in the same location:
root@aislynn# zpool replace tank0 /dev/ad4
This functionality can be automated by setting autoreplace to on.
Example replacing a failed disk with a new disk in a different location:
root@aislynn# zpool replace tank0 /dev/ad4 /dev/ad6

create

I previously covered creating a zpool in the Brief Intro article a few weeks ago. There's not much to the command, simply follow create with the name of the pool and what vdevs will be backing it.

A simple example creating tank0 with a single SATA disk:
root@aislynn# zpool create tank0 /dev/ad0
Creating tank0 with a two disk mirror:
root@aislynn# zpool create tank0 mirror /dev/ad0 /dev/ad1
Creating tank0 with three disks in a raidz1 (single parity redundancy):
root@aislynn# zpool create tank0 raidz1 /dev/ad0 /dev/ad1 /dev/ad2
Creating tank0 with four disks in a raidz2 (double parity redundancy):
root@aislynn# zpool create tank0 raidz2 ad0 ad1 ad2 ad3
In addition to the standard vdevs (disks or redundancy layers backed by disks), there are three other types of vdevs (sometimes called pseudo-vdevs). They are spare, log, and cache. The names of these vdevs are pretty descriptive. Like any other vdev these special ones are shared by the whole pool and can be added (or removed) after a pool is created.

A vdev added as a spare must be attached to a redundancy vdev, when a disk in the array fails, the zpool will select a spare that is at least as big as the failed disk and replace the failed disk with the spare. Spares are shared by the whole pool, they are not assigned to a specific array.

Creating tank0 with a mirror and a spare disk:
root@aislynn# zpool create tank0 mirror ad0 ad1 spare ad2
A log vdev is used to locate the Intent Log on different storage than the main storage for the pool. By default when a synchronous write is committed to disk, it is first written tot the Intent Log; after that operation completes the data is written to the actual file. There are plenty of sources discussing the benefits of journaling, but moving the Intent Log to a small high speed disk (or array) allows these operations to complete faster and therefor return from the synchronous write quicker. Locating the Intent Log on high speed storage will only accelerate synchronous writes, asynchronous writes will return the same in any case.

You can use more than one disk as an Intent Log. Like other vdevs, you can also use a mirror, raidz, or a file, even a file on an NFS (though none of these are recommended).

Creating tank0 with an Internet Log on a different disk (eg, EFD or SSD):
root@aislynn# zpool create tank0 mirror /dev/ad0 /dev/ad1 log /dev/ad5
A cache vdev is used to expand the working set cache onto disk. The working set is the file cache located in main memory (RAM). You can create a tiered caching system by adding high speed disks (EFDs and SSDs work nicely too) to a storage pool. This will have the largest effect on file systems that a very read heavy, or read-write heavy (file systems which are write heavy may benefit as well, though usually not as much). If the file system is lightly used, especially infrequent and small reads and writes, you will see no benefit by adding a caching disk.

Unlike all other vdevs, caches can not be file backed. They can only be backed by a whole disk, or a disk slice (partition).
Creating tank0 with a cache disk:
root@aislynn# zpool create tank0 /dev/ad0 cache /dev/ad5

destroy

Destroy is a simple command with a simple syntax. The only option it takes is -f to force the destruction of the pool even if it is in use. The only argument is the name of the pool. Destroyed pools can not be recovered (at least not easily), so double check your command before issuing it.

add

Since you are already familiar with the create command, the add command should be very easy since it is very similar. Simply provide the pool name and the vdevs you want to add.

Adding a mirror to tank0:
root@aislynn# zpool add tank0 mirror /dev/ad5 /dev/ad6
Adding a spare and cache disks to tank0:
root@aislynn# zpool add tank0 spare /dev/ad3 cache /dev/ad8

remove

The remove command might be a bit misleading, as it only removes spare and cache devices. If you want to remove a storage or redundancy vdev you have to use the detach command. Non-redundant and raidz vdevs can not be removed from a pool. The only argument this command takes is the device you want to remove.

attach

The attach command is different from the add command in that you attach a vdev to a mirror, or to create a mirror. If the device you are attaching to is just a plain disk, then this command will create a mirror, with the new vdev. If the device you are attaching to is already a mirror, the new device will be added to the mirror (creating a 3 or more way mirror). The array will initiate resilvering immediately.

An example, creating a new pool with a single disk, then attaching a disk to produce a mirror:
root@aislynn# zpool create tank1 /dev/ad1
root@aislynn# zpool attach tank1 /dev/ad1 /dev/ad2
root@aislynn# zpool status tank1
  pool: tank1
 state: ONLINE
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        tank1          ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            ad1        ONLINE       0     0     0
            ad2        ONLINE       0     0     0

errors: No known data errors
At this time you can not attach disks to a raidz vdev. You can only attach to a mirror (or to create a mirror). When ZFS was conceived it was not planned to ever add such a feature because of the technical difficulty. However, some people have been working on the problem and have conceptualized a way of attaching disks to an existing raidz. At this time I have not seen working code. I expect someday this feature will be worked into ZFS, but not in the near future.

detach

The obvious opposite of attach is detach. It takes the pool and a mirror device from that pool as arguments.

Breaking the mirror I created in the attach example (above):
root@aislynn# zpool detach tank1 /dev/ad2
root@aislynn# zpool status tank1
  pool: tank1
 state: ONLINE
 scrub: none requested
config:

        NAME   STATE     READ WRITE CKSUM
        tank1  ONLINE       0     0     0
          ad1  ONLINE       0     0     0

errors: No known data errors

export

The export command marks a pool as exported. Once the pool is marked exported, it can be physically detached from the system (normal hardware requirements for detachment apply. Eg, remove power if not hot-swappable), then attached to another system and imported there. The export command takes the pool name as an argument. You can also use the -f switch to force the export of a pool that is in use.

This command is most useful for exporting pools from live systems with hot-swap compatible hardware. You do not need to export a pool if the system is going to be shutdown when it is disconnected. Exporting the pool before shutting down similarly has no negative impact (except that if you change your mind, the disks will remain exported).

Exporting a pool created earlier:
root@aislynn# zpool export tank0

import

After a pool has been exported, disconnected, and connected to a new system, you can use the import command to bring it online. The import command can also list any pool waiting to be imported by omitting any arguments to the import command.

upgrade

Each zpool stores an internal version number. This sets which features are available for use on the pool. For the most part newer pools can be used on older system without too many issues, the system will not gain any new functionality, but it should not corrupt the pool.

If the system is newer however, the pool can be upgraded using the upgrade command. In any case you will be able to use the features of the lesser of the two numbers (system and pool versions).  The upgrade command takes a single argument, the name of the pool to upgrade.

zfs

The zfs utility manages the file systems on a zpool. The root file system is named the same as the pool itself, in most of the examples above that would be tank or tank1. The zfs utility has many functions which will not be covered here (many will be the subject of upcoming articles as the topics can be very complex).

list

The list command is very similar to zpool's list command; it lists all known zfs file systems on the computer. Additional details include the amount of space available, used, and how much space is used by that specific file system. The "used" number includes all child file systems, these count against the parent file system's quote and are so included here. The "refer" number shows how much is actually used by files in that file system. The "avail" number reflects the maximum space this file system might be allowed to use. Keep in mind that file systems share pool space, so the total of all file systems' "avail" space may exceed the storage pool's available space.

List output after my Intro article:
root@aislynn# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
tank            50G   150G    26K  /tank
tank/home      400M  19.6G   380M  /usr/home
tank/storage    40G   150G    40G  /storage

create

The create command very simply create a new child file system within a existing zfs file system. The command takes one argument, the path where you want the file system created. One requirement is that the file systems have to be directly nested. For instance I cannot create a folder in /usr/home - named family for example - then create file systems for each person's home folder within the family folder.

You must directly nest file systems, example error if you try otherwise:
root@aislynn# mkdir /usr/home/family
root@aislynn# zfs create tank/home/family/dad
cannot create 'tank/home/family/dad': parent does not exist
You can however nest an intermediary file system named family in there.
root@aislynn# zfs create tank/home/family
root@aislynn# zfs create tank/home/family/dad
To the user, these all look like folders, they are unaware of the details.
 When creating file systems you must use the file system path, not the mount point path. Using the mount point path will cause an error complaining about a "leading slash in name".

destroy

The destroy command works very much like rm, but on zfs file systems (it also works on snapshots and clones). There are three arguments to be aware of when using destroy.

-f forces an file systems that are in use to be unmounted and destroyed.
-r recursively destroys any children file systems.
-R destroys any dependents (encountered when using snapshots and clones).

More to come in my FreeBSD » ZFS category.