https://wiki.fai-project.org/index.php?title=ZFS_root_with_Debian_Stretch_and_FAI&feed=atom&action=historyZFS root with Debian Stretch and FAI - Revision history2024-03-29T11:30:55ZRevision history for this page on the wikiMediaWiki 1.39.5https://wiki.fai-project.org/index.php?title=ZFS_root_with_Debian_Stretch_and_FAI&diff=3566&oldid=prevSteffenGrunewald: Install ZFS using Debian Stretch and FAI (root pool 3-way mirror, data pool raidz3)2019-04-04T10:04:38Z<p>Install ZFS using Debian Stretch and FAI (root pool 3-way mirror, data pool raidz3)</p>
<p><b>New page</b></p><div>=== SET UP ZFS ROOT USING FAI ===<br />
<br />
<br />
<br />
<br />
<br />
==== Task: ====<br />
Setup a ZFS root pool (rpool) and an additional data pool (export)<br />
* using FAI, Debian Stretch<br />
* on a machine with 3 SSDs (sd[amn]) and 11 HDDs (sd[b-l])<br />
* with LEGACY BIOS (i.e., no UEFI)<br />
* using a class <tt>ZFS_SERVER</tt>; another class script sets <tt>DISKS_14</tt> (only if there are 14 disks) which is used as a safeguard<br />
<br />
<br />
<br />
==== Used documentation: ====<br />
* https://github.com/zfsonlinux/zfs/wiki/Debian-Stretch-Root-on-ZFS<br />
* http://www.thecrosseroads.net/2016/02/booting-a-zfs-root-via-uefi-on-debian<br />
* https://www.funtoo.org/ZFS_Install_Guide<br />
(and some more which I discarded in favour of the three above)<br />
<br />
<br />
==== Considerations: ====<br />
Since there's always one more way to reach a certain goal, the sequence of steps, and file contents, presented here <br />
are not the only way to achieve the setup intended. Nevertheless they worked for me, and I'm not in the position to<br />
add more tests and modifications since the machines have started their production life weeks ago.<br />
<br />
Due to the way FAI works, installing <tt>spl-dkms</tt> and <tt>zfs-dkms</tt> may result in partially broken dependencies<br />
(usually because <tt>linux-headers</tt> haven't been installed yet). I addressed this by adding a few hooks; of course<br />
the very same hooks could also be used to fully install those, and related, packages, instead of adding them to the<br />
package list.<br />
<br />
My setup may be somewhat atypical, due to its particular history:<br />
I got two SSDs and a spare for the operating system, and 11 HDDs for 8+2+1 RAID-6 with spare for data storage,<br />
but somehow the hardware RAID controller was dropped from the specs at some point.<br />
Instead of mdraid, this time I wanted to get rid of it, and make use of the extra features ZFS provides.<br />
Thus, all three SSDs were combined into a single mirror (consider this as RAID-1 with spare already constantly resilvering),<br />
and the 11 spinning disks were used to form a single RAIDZ3 (there's no RAID-7 level that might compare, but we now can<br />
stand two disk failures and still have "RAID-5 type" redundancy - without any resilvering/rebuilding delays).<br />
Basically, this is similar to a "hot spares" setup, with the extra disks indeed "hot" but not exactly "spares" waiting<br />
to be rebuilt. I consider this an extra gain in robustness (and the ZFS checksum consistency checks are enother plus).<br />
<br />
I left the option open to set <tt>sharenfs=on</tt> or let the <tt>kernel-nfs-server</tt> handle NFS requests.<br />
<br />
I did not succeed in enabling UEFI. One has to be extra careful when assigning EIF partitions, and making sure they<br />
(all!) get the necessary software installed.<br />
(Could this be done by faking a /boot/efi tree, not necessarily a partition, and copying its contents to multiple filesystems?<br />
How would a running system with multiple disks, e.g. a rootfs RAID, handle multiple EFI partitions?)<br />
<br />
Caveat: '' The following instructions had to be recovered from a printout using OCR. Some recognition errors may have survived. ''<br />
<br />
<br />
==== Steps: ====<br />
<br />
===== (1) Prepare your NFSROOT. =====<br />
<br />
'' I've found that the dkms packages would be installed at a random time, ''<br />
'' and their configure step would fail. Fix this in a (post-create) hook. ''<br />
<br />
<br />
<pre><br />
# diff /etc/fai/NFSROOT.orig /etc/fai/NFSROOT<br />
linux-headers-amd64 # or whatever your arch is<br />
spl-dkms<br />
zfs-dkms<br />
zfsutils-linux<br />
zfs-dracut<br />
</pre><br />
<br />
<pre><br />
# cat /etc/fai/nfsroot-hooks/90-zfs<br />
#!/bin/bash<br />
<br />
export DEBIAN_FRONTEND=noninteractive<br />
$ROOTCMD dpkg-reconfigure -fnoninteractive spl-dkms<br />
$ROOTCMD dpkg-reconfigure -fnoninteractive zfs-dkms<br />
</pre><br />
<br />
'' Note: One probably may leave the NFSROOT package list unchanged, and ''<br />
'' perform the additions using <tt>apt-get install</tt>, in the right order, in the hook. ''<br />
<br />
<br />
===== (2) Build your NFSROOT (as usual). =====<br />
'' Check the verbose output that the modules have actually been built. ''<br />
<br />
<br />
<br />
===== (3) Make some additions to your FAI config, as below. =====<br />
'' DO NOT COPY AND PASTE. THERE MAY BE DRAGONS. READ, UNDERSTAND, AND ADJUST TO YOUR PLAN ''<br />
<br />
For sysinfo, first add some class/* to e.g. find existing zpools.<br />
This may also prove useful if you break something later.<br />
<br />
During install,<br />
* disk zapping is done in the "partition" hook,<br />
* zpool and zfs creation take place in a "mountdisks" hook,<br />
* modules get fixed in a "configure" hook<br />
* and final zpool export is done in "savelog".<br />
<br />
In the scripts/GRUB_PC directory,<br />
* I modified the 10-setup script of GRUB_PC (unnecessarily?)<br />
* and added a 09-zfs one to get the initramdisk refreshed<br />
<br />
<br />
The <tt>/target</tt> tree may be already populated. Importing existing zpools may cause confusion:<br />
<pre><br />
# cat config/class/39-zfs<br />
#!/bin/bash<br />
<br />
# redirect output to classes aren't clobbered<br />
(<br />
modprobe spl<br />
modprobe zfs<br />
<br />
# install target may already exist, get it out of the way & recreate<br />
if [ -d $target ]<br />
then<br />
ls -l $target<br />
mv $target $target.000<br />
fi<br />
mkdir -p $target<br />
<br />
# are there any existing targets? (-f option may be useful but dangerous)<br />
zpool import -a -d /dev/disk/by-id -R $target<br />
zpool list<br />
zfs list -t all -o name,type,mountpoint,compress,exec,setuid,atime,relatime<br />
# properly export all zpools again<br />
zpool export -a<br />
zpool list<br />
<br />
# restore original state of install target<br />
if [ -d $target.000 ]<br />
then<br />
mv $target $target.001<br />
mv $target.000 $target<br />
fi<br />
) >&2<br />
</pre><br />
<br />
Before "partitioning", zap disk MBRs:<br />
<pre><br />
# cat config/hooks/partition.ZFS_SERVER<br />
#!/bin/bash<br />
<br />
ifclass ZFS_SERVER && {<br />
ifclass DISKS_14 && {<br />
# clear partition tables - disk selection to be adjusted!<br />
for disk in \<br />
/dev/disk/by-id/ata-* \<br />
/dev/disk/by-id/scsi-* \<br />
<br />
do<br />
case $disk in<br />
*-part*)<br />
continue<br />
;;<br />
esac<br />
echo zapping disk $disk<br />
sgdisk --zap-all $disk 2>/dev/null || echo problem zapping $disk<br />
done<br />
# DO NOT prepare boot partitions (yet)<br />
}<br />
}<br />
</pre><br />
<br />
Before "mounting", set up the pools and filesystems:<br />
<pre><br />
# cat config/hooks/mountdisks/ZFS_SERVER<br />
#!/bin/bash<br />
<br />
ifclass ZFS_SERVER && (<br />
ifclass DISKS_14 && {<br />
# we've got 14 disks: 3*128GB SSDs, 11*10TB HDs export/data<br />
modprobe spl<br />
modprobe zfs<br />
# extract disk IDs, sort by current "sd" name - this is for my device tree, yours may differ!<br />
ssds=`ls -l /dev/disk/by-id/ata-* | grep -v -- -part | awk '{print $NF, $(NF-2)}' | sort | cut -d" " -f2`<br />
hdds=`ls -l /dev/disk/by-id/scsi-* | grep -v -- -part | awk '{print $NF, $(NF-2)}' | sort | cut -d" " -f2`<br />
<br />
# create root pool, set altroot to /target but don't mount yet<br />
{<br />
zpool create \<br />
-f \<br />
-o ashift=12 \<br />
-o autoreplace=on \<br />
-R $target \<br />
-O mountpoint=none \<br />
-O atime=off -O relatime=on \<br />
-O compression=lz4 \<br />
-O normalization=formD \<br />
-O xattr=sa -O acltype=posixacl \<br />
rpool \<br />
mirror \<br />
$ssds<br />
# install GRUB on all disks of the pool<br />
echo "BOOT_DEVICE=\"$ssds\"" >> $LOGDIR/disk_var.sh<br />
# main bootenv dataset<br />
zfs create \<br />
-o mountpoint=none \<br />
rpool/ROOT<br />
# this one we're going to use<br />
zfs create \<br />
-o mountpoint=/ \<br />
rpool/ROOT/debian<br />
zfs set \<br />
mountpoint=/rpool \<br />
rpool<br />
zpool set \<br />
bootfs=rpool/ROOT/debian \<br />
rpool<br />
# current state<br />
zfs mount<br />
zpool get all rpool<br />
zfs list -t all -o name,type,mountpoint,compress,exec,setuid,atime,relatime<br />
zpool export rpool<br />
}<br />
<br />
# create data pool<br />
{<br />
zpool create \<br />
-f \<br />
-o ashift=12 \<br />
-o autoreplace=on \<br />
-R $target \<br />
-O mountpoint=/export \<br />
-O atime=off -O relatime=on \<br />
-O compression=lz4 \<br />
-O normalization=formD \<br />
-O xattr=sa -O acltype=posixacl \<br />
-O recordsize=1M \<br />
export \<br />
raidz3 \<br />
$hdds<br />
zfs create \<br />
-o setuid=off \<br />
-o mountpoint=/export/data \<br />
-o sharenfs=off \<br />
export/data<br />
# [...]<br />
# current state<br />
zfs mount<br />
zpool get all export<br />
zfs list -t all -o name,type,mountpoint,compress,exec,setuid,atime,relatime<br />
zpool export export<br />
}<br />
<br />
zpool list<br />
# /target *should* be empty now. But you never know...<br />
echo check $target:<br />
ls -lR $target<br />
echo clean $target:<br />
mv $target $target.000<br />
mkdir $target<br />
# now re-import the stuff<br />
zpool import -d /dev/disk/by-id -R $target rpool<br />
zpool import -d /dev/disk/by-id -R $target export<br />
# and save the state<br />
mkdir -p $target/etc/zfs<br />
zpool set cachefile=$target/etc/zfs/zpool.cache rpool<br />
zpool set cachefile=$target/etc/zfs/zpool.cache export<br />
<br />
# prepare for grub<br />
ifclass GRUB_PC && \<br />
{<br />
for ssd in $ssds<br />
do<br />
echo Preparing $ssd for GRUB_PC:<br />
partx --show $ssd<br />
/sbin/sgdisk -al -n2:48:2047 -t2:EF02 -c2:"BIOS boot partition" \<br />
$ssd<br />
partx --update $ssd<br />
partx --show $ssd<br />
done<br />
}<br />
<br />
}<br />
}<br />
</pre><br />
<br />
Before configuring, make sure the modules are there. Same reasoning applies as for the NFSROOT,<br />
i.e. this might be the place to install SPL and ZFS packages in the right order:<br />
<pre><br />
cat config/hooks/configure/ZFS_SERVER<br />
#!/bin/bash<br />
<br />
export DEBIAN_FRONTEND=noninteractive<br />
$ROOTCMD dpkg-reconfigure -fnoninteractive spl-dkms<br />
$ROOTCMD dpkg-reconfigure -fnoninteractive zfs-dkms<br />
<br />
# of course, we have to load those modules inside the chroot<br />
$ROOTCMD modprobe spl<br />
$ROOTCMD modprobe zfs<br />
</pre><br />
<br />
Before ending the install, properly export all pools:<br />
<pre><br />
# cat config/hooks/savelog.ZFS_SERVER<br />
#!/bin/bash<br />
<br />
ifclass ZFS_SERVER && \<br />
{<br />
zpool export -a<br />
}<br />
</pre><br />
<br />
Config files. We don't need any disk structure:<br />
<pre><br />
# cat config/disk_config/ZFS_SERVER<br />
# no disk config at all<br />
</pre><br />
<br />
and the following might be (at least in part) replaced by <tt>apt-get install</tt> in a hook:<br />
<pre><br />
# cat config/package_config/ZFS_SERVER<br />
PACKAGES install<br />
<br />
# [...]<br />
<br />
# openzfs (if not installed by hook)<br />
spl-dkms<br />
zfs-dkms<br />
zfsutils-linux<br />
libzfslinux-dev<br />
libzfs2linux<br />
libzpoolzlinux<br />
zfs-zed<br />
zfs-initramfs<br />
</pre><br />
<br />
The following script is to ensure and confirm everything (including initrd) is ready for GRUB installation:<br />
<pre><br />
# cat config/scripts/GRUB_PC/09-zfs<br />
#!/bin/bash<br />
<br />
ifclass ZFS_SERVER && {<br />
echo reinstalling grub-pc ...<br />
$ROOTCMD apt-get -y install --reinstall grub-pc \<br />
&& echo ... reinstalled grub-pc \<br />
|| echo ... grub-pc reinstall problem<br />
echo searching for zfs modules ...<br />
$ROOTCMD find /1ib/modules \( -name spl\* -o -name zfs\* \) -ls<br />
echo searching for kernel and initrd<br />
$ROOTCMD find /boot \( -name initrd\* -o -name vmlinuz\* \) -ls<br />
<br />
echo check whether grub recognizes the rootfs<br />
$ROOTCMD grub-probe<br />
echo rebuilding initramfs \(may not be necessary any longer\) ...<br />
$ROOTCMD update-initramfs -u -v -k all \<br />
&& echo ... initramfs rebuilt \<br />
|| echo ... initramfs rebuild problem<br />
echo changing grub defaults<br />
sed -i \<br />
-e 's~.*\(GRUB_TERMINAL=console.*\)~\1~' \<br />
-e 's~\(^GRUB_CMDLINE_LINUX_DEFAULT=.*\)quiet\(.*$\)~\1\2~' \<br />
$target/etc/default/grub<br />
grep GRUB_ $target/etc/default/grub<br />
}<br />
</pre><br />
<br />
I'm not really sure the changes to GRUB-PC/10-setup are necessary but they<br />
were the result of a long trial-and-error phase which I didn't want to jeopardize:<br />
<pre><br />
# diff -u config/scripts/GRUB_PC/10-setup{.orig,}<br />
--- 10-setup.orig 2018-05-30 10:50:42.000000000 +0200<br />
+++ 10-setup 2018-12-07 14:17:14.564558118 +0100<br />
@@ -26,5 +26,7 @@<br />
fi<br />
<br />
GROOT=$($ROOTCMD grub-probe -tdrive -d $BOOT_DEVICE)<br />
+echo Using GROOT=\"${GROOT)\" for BOOT_DEVICE=\"${BOOT_DEVICE}\"<br />
<br />
# handle /boot in lvm-on-md<br />
_bdev=$(readlink -f $BOOT_DEVICE)<br />
@@ -42,10 +43,25 @@<br />
$ROOTCMD grub-install --no-floppy "/dev/sdevice"<br />
done<br />
else<br />
- $ROOTCMD grub-install --n0-f10ppy "$GROOT"<br />
+# $ROOTCMD grub-install --no-floppy "$GROOT"<br />
+# if [ $? -eq 0 ]; then<br />
+# echo "Grub installed on $BOOT_DEVICE = $GROOT"<br />
+# fi<br />
+ for GROOTITEM in SGROOT<br />
+ do<br />
+ # strip parentheses<br />
+ GROOTITEM=${GROOTITEM#(}<br />
+ GROOTITEM=${GROOTITEM%)}<br />
+ echo Now GROOTITEM=\"${GROOTITEM}\"<br />
+ # strip hostdisk/ prefix<br />
+ GROOTITEM=$(echo $GROOTITEM | sed 's~hostdisk/~~')<br />
+ echo Using GROOTITEM=\"${GROOTITEM}\"<br />
+ echo Install grub on $GROOTITEM:<br />
+ $ROOTCMD grub-install --no-floppy "$GROOTITEM"<br />
if [ S? -eq 0 ]; then<br />
- echo "Grub installed on $BOOT_DEVICE = SGROOT"<br />
+ echo "Grub installed on $BOOT_DEVICE = $GROOTITEM"<br />
fi<br />
+ done<br />
fi<br />
$ROOTCMD update-grub<br />
</pre><br />
<br />
<br />
<br />
===== (4) Run sysinfo first =====<br />
to know your hardware, and check whether classes are OK<br />
<br />
<br />
<br />
===== (5) If you feel like it, run your first FAI install ;) =====<br />
<br />
<br />
That's it...<br />
<br />
<br />
'' Steffen Grunewald <steffen.grunewald@aei.mpg.de> 2018-2019 ''</div>SteffenGrunewald