ZFS is organized into two logical administrative units: the storage pool where all the data is kept and the file system that keeps track of file locations, permissions, and other meta-data. On Solaris systems the file system also keeps track of sharing and a few additional features; since I'm running FreeBSD, those do not apply (at the time of writing).
This is more than just a reference page. It contains quite a bit of commentary and best practices. If you have questions or suggestions, please do contact me.
Many of the examples below show the tank zpool I created in the ZFS - Brief Intro article.
Some of the details have been modified for simplicity and security.
The zpool utility
The zfs utilityCommands related to properties, snapshots, and clones have been purposely omitted as they cover complex topics and will be the focus of upcoming articles.
zpoolThe zpool utility manages the storage pool itself. It is not directly aware of the file systems you create in the pool, so all pool arguments take the pool name only (usually just tank or pool0). The following commands are the most common issued to the zpool utility. Typically these are only used when creating a storage pool, fixing something that has gone wrong, or during routine maintenance.
root@aislynn# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 300G 30G 270G 10% ONLINE -
root@aislynn# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad5 ONLINE 0 0 0 ad6 ONLINE 0 0 0 errors: No known data errors
History of the pool from the attach/detach section:
root@aislynn# zpool history tank1 History for 'tank1': 2010-12-09.19:04:03 zpool create tank1 /dev/ad1 2010-12-09.19:06:32 zpool attach tank1 /dev/ad1 /dev/ad2 2010-12-09.19:06:49 zpool detach tank1 /dev/ad2
Syntax: zpool scrub [-s] [pool_name]
The -s option stops a scrub already in process on a pool.
root@aislynn# zpool scrub tankWhen the command is run, there is no information returned (unless an error occurs), the scrub is simply started. To see the status of the scrub, run the status command. During the scrub operation the progress (as a percent) and time to completion will be displayed. After the scrub process the results and time of completion will be displayed. During the scrub process, ZFS will utilize all "spare" disk throughput. In practice this results in notable disk performance degradation; you should normally do this during off-hours so users do not notice.
Scheduling a periodic disk scrub is easy using cron, just open /etc/crontab and add an entry something like the following:
0 2 * * 0 root zpool scrub tankThis schedules a scrub of zpool tank at 2am every Sunday morning. Whatever time you choose, be sure the scrub will not be interfering with another task running at the same time.
Alternately, you can add a scrub task to the periodic system by creating the file /usr/local/etc/periodic/weekly/404.zfs-scrub and putting the following in it:
#!/bin/sh - zpool scrub tankThis will add a task to the periodic schedule (weekly in this example) that runs the same pool scrub command from above.
Syntax: zpool replace [pool_name] [old_vdev] [new_vdev]
Example replacing a failed disk with a new disk in the same location:
root@aislynn# zpool replace tank0 /dev/ad4This functionality can be automated by setting autoreplace to on.
Example replacing a failed disk with a new disk in a different location:
root@aislynn# zpool replace tank0 /dev/ad4 /dev/ad6
A simple example creating tank0 with a single SATA disk:
root@aislynn# zpool create tank0 /dev/ad0Creating tank0 with a two disk mirror:
root@aislynn# zpool create tank0 mirror /dev/ad0 /dev/ad1Creating tank0 with three disks in a raidz1 (single parity redundancy):
root@aislynn# zpool create tank0 raidz1 /dev/ad0 /dev/ad1 /dev/ad2Creating tank0 with four disks in a raidz2 (double parity redundancy):
root@aislynn# zpool create tank0 raidz2 ad0 ad1 ad2 ad3In addition to the standard vdevs (disks or redundancy layers backed by disks), there are three other types of vdevs (sometimes called pseudo-vdevs). They are spare, log, and cache. The names of these vdevs are pretty descriptive. Like any other vdev these special ones are shared by the whole pool and can be added (or removed) after a pool is created.
A vdev added as a spare must be attached to a redundancy vdev, when a disk in the array fails, the zpool will select a spare that is at least as big as the failed disk and replace the failed disk with the spare. Spares are shared by the whole pool, they are not assigned to a specific array.
Creating tank0 with a mirror and a spare disk:
root@aislynn# zpool create tank0 mirror ad0 ad1 spare ad2A log vdev is used to locate the Intent Log on different storage than the main storage for the pool. By default when a synchronous write is committed to disk, it is first written tot the Intent Log; after that operation completes the data is written to the actual file. There are plenty of sources discussing the benefits of journaling, but moving the Intent Log to a small high speed disk (or array) allows these operations to complete faster and therefor return from the synchronous write quicker. Locating the Intent Log on high speed storage will only accelerate synchronous writes, asynchronous writes will return the same in any case.
You can use more than one disk as an Intent Log. Like other vdevs, you can also use a mirror, raidz, or a file, even a file on an NFS (though none of these are recommended).
Creating tank0 with an Internet Log on a different disk (eg, EFD or SSD):
root@aislynn# zpool create tank0 mirror /dev/ad0 /dev/ad1 log /dev/ad5A cache vdev is used to expand the working set cache onto disk. The working set is the file cache located in main memory (RAM). You can create a tiered caching system by adding high speed disks (EFDs and SSDs work nicely too) to a storage pool. This will have the largest effect on file systems that a very read heavy, or read-write heavy (file systems which are write heavy may benefit as well, though usually not as much). If the file system is lightly used, especially infrequent and small reads and writes, you will see no benefit by adding a caching disk.
Unlike all other vdevs, caches can not be file backed. They can only be backed by a whole disk, or a disk slice (partition).
Creating tank0 with a cache disk:
root@aislynn# zpool create tank0 /dev/ad0 cache /dev/ad5-f to force the destruction of the pool even if it is in use. The only argument is the name of the pool. Destroyed pools can not be recovered (at least not easily), so double check your command before issuing it.
Adding a mirror to tank0:
root@aislynn# zpool add tank0 mirror /dev/ad5 /dev/ad6Adding a spare and cache disks to tank0:
root@aislynn# zpool add tank0 spare /dev/ad3 cache /dev/ad8detach command. Non-redundant and raidz vdevs can not be removed from a pool. The only argument this command takes is the device you want to remove.
An example, creating a new pool with a single disk, then attaching a disk to produce a mirror:
root@aislynn# zpool create tank1 /dev/ad1 root@aislynn# zpool attach tank1 /dev/ad1 /dev/ad2 root@aislynn# zpool status tank1 pool: tank1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad1 ONLINE 0 0 0 ad2 ONLINE 0 0 0 errors: No known data errorsAt this time you can not attach disks to a raidz vdev. You can only attach to a mirror (or to create a mirror). When ZFS was conceived it was not planned to ever add such a feature because of the technical difficulty. However, some people have been working on the problem and have conceptualized a way of attaching disks to an existing raidz. At this time I have not seen working code. I expect someday this feature will be worked into ZFS, but not in the near future.
Breaking the mirror I created in the attach example (above):
root@aislynn# zpool detach tank1 /dev/ad2 root@aislynn# zpool status tank1 pool: tank1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 ad1 ONLINE 0 0 0 errors: No known data errors-f switch to force the export of a pool that is in use.
This command is most useful for exporting pools from live systems with hot-swap compatible hardware. You do not need to export a pool if the system is going to be shutdown when it is disconnected. Exporting the pool before shutting down similarly has no negative impact (except that if you change your mind, the disks will remain exported).
Exporting a pool created earlier:
root@aislynn# zpool export tank0
If the system is newer however, the pool can be upgraded using the upgrade command. In any case you will be able to use the features of the lesser of the two numbers (system and pool versions). The upgrade command takes a single argument, the name of the pool to upgrade.
zfsThe zfs utility manages the file systems on a zpool. The root file system is named the same as the pool itself, in most of the examples above that would be tank or tank1. The zfs utility has many functions which will not be covered here (many will be the subject of upcoming articles as the topics can be very complex).
List output after my Intro article:
root@aislynn# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 50G 150G 26K /tank tank/home 400M 19.6G 380M /usr/home tank/storage 40G 150G 40G /storage
You must directly nest file systems, example error if you try otherwise:
root@aislynn# mkdir /usr/home/family root@aislynn# zfs create tank/home/family/dad cannot create 'tank/home/family/dad': parent does not existYou can however nest an intermediary file system named family in there.
root@aislynn# zfs create tank/home/family root@aislynn# zfs create tank/home/family/dadTo the user, these all look like folders, they are unaware of the details.
When creating file systems you must use the file system path, not the mount point path. Using the mount point path will cause an error complaining about a "leading slash in name". rm, but on zfs file systems (it also works on snapshots and clones). There are three arguments to be aware of when using destroy.
-f forces an file systems that are in use to be unmounted and destroyed.
-r recursively destroys any children file systems.
-R destroys any dependents (encountered when using snapshots and clones).
More to come in my FreeBSD » ZFS category.