ZFS Admin

Overview of the Zettabyte File System (ZFS)

  • ZFS first appeared in 2005 ported with OpenSolaris; Linux implementation appeared in 2010.
  • Physical disks are aggregated and/or organized as "zpools" that can be configured and exported as mount points.
  • Similar to "volumes" in traditional filesystems.

Pros:

  • High reliability owing to a configurable software RAID.
  • Large Capacity: can support up to 1M Petabytes (or 1B T)
  • Fast Reads due to COW (Copy on Write)

Cons:

  • Non-parallel: Writes relatively slow
  • Does not support all Linux system calls, eg. fallocate().

ZFS@SAM: hardware

  • Two storage servers
  • Each server has 36 3.5” 5.45T (6,001,175,126,016 Bytes) attached to a JBOD (Just a Bunch of Disks) of 60 similar drives totaling 96 disks
  • One server has 523.96 T making a grand total of 1.02 Petabyte (1047.93 T) of storage space
  • Each server has 16 2.10 GHz Intel xeon cores, 64G memory and 2X256 G Solid State Drives (SSD)
  • Each server is equipped with a 10GbE and a 1GbE Network Interface Card
  • Main server and JBOD chassis are connected via two 12Gbps externally attached SAS (Serially Attached SCSI) cables.

Configuration

  • Server chassis contains processing and network apparatus and three "zpools".
  • Each zpool consists of twelve 5.45T drives in a 10+2 RAIDZ-2 configuration (RAID 6).
  • Each zpool has 48T of space for user data (out of a total of 54.5T).
  • Each JBOD contains five zpools.

See the diagram below for a depiction of the configuration:

zfsconf

Quick Tutorial

zfs list #list all zpools and mountpoints
zfs set compression=lz4 <zpoolname> #enable compression
zfs create jbodp4/a/filesystem
zfs destroy jbodp4/a/filesystem
zfs get <prop> <zpoolname> #get value of a property
zfs set <prop>=<val> <zpoolname> #set value of a property
zfs get all #get all properties of all zpools 
zpool status #get status of all zpools
zpool status -x #check if all zpools are healthy
zpool scrub jbodp4 #scrub a pool
zpool iostat #get I/O stats
zpool replace -f jbodp4 <disk-name> #resilver a disk

Step by step example: Create a new user group

login to frank/htc/mpi cluster
login to zfs1-stage.sam.pitt.edu or zfs2-stage.sam.pitt.edu as root
zfs create jbodp7/xiaosongwang
zfs set quota=5120G jbodp7/xiaosongwang
Add this line to /etc/exports:
/zfs/7/xiaosongwang *.sam.pitt.edu(rw,anongid=16350,no_root_squash)
service nfs reload
mount zfs1-stage.sam.pitt.edu:/zfs /zfs1

For users

Mounted as /zfs/n/usergroup over MPI, HTC and Frank login nodes, where n=1 to 16 for each zpool, usergroup is the group name.

eg. /zfs1/1/sam, /zfs1/2/kjordan, /zfs1/2/kjohnson.

Each group has a quota of 5T (subject to change).

ZFS over NFS

NFS server side (which is also ZFS server):

  1. Enable sharenfs start nfs server
  2. Create filesystems, set quota
  3. Add filesystems to /etc/exports
  4. Reload NFS service: service nfs reload

NFS client side:

  1. Mount the exported filesystem
  2. Adjust /etc/idmapd.conf (one time activity)
  3. Run /etc/init.d/rpcidmapd restart (one time activity)

The /etc/fstab entry on the client will look like so:

zfs1-stage.sam.pitt.edu:/zfs1/7/xiaosongwang   /zfs1/7/xiaosongwang nfs  vers=4   0 0

ACLs

Access Control Lists or ACLs are used for a fine tuned access control to files in Linux. They help in selectively allowing users to access particular locations across a file system. Details coming soon.

References and Resources

AttachmentSize
zfs.png82.75 KB