Upgrading Lustre

It’s been close to a year since I updated our cluster; I was going to do it over Christmas, but never quite got around to it. The period of social distancing (and procrastinating on my research) is a great time, right? The cluster is running Centos 7. The biggest issue with upgrading it is the Lustre file system. These are all my notes on the upgrade process. I’m hoping by writing them down here, my life will be somewhat easier the next time I need to do this. Learning how Lustre works all over again every time I do an update is an involved process!

Lustre is very picky about the version of the Linux kernel. This means we can’t just do a blanket “sudo yum update” on the system. We need to upgrade to the specific kernel version that is required by the new version of Lustre we will be installing.

On wyeast, the Lustre server is installed across three different nodes: wyeast-lustre01, wyeast-lustre02, and wyeast-lustre03. The metadata server is on the first node, and the object storage targets are stored on lustre02 and lustre03.

First, update the list of updates that yum knows about:

sudo yum makecache

Next, look at the lustre-server repo and find the current version of the Lustre server and the Linux kernel it uses.

sudo yum repo-pkgs lustre-server list

From this, I found that the current Lustre server version is 2.12.4. I checked the changelog on lustre.org to determine the kernel version needed:

http://wiki.lustre.org/Lustre_2.12.4_Changelog

The Linux kernel needed is actually available in the Lustre-server repo:

kernel-3.10.0-1062.9.1.el7_lustre

So I needed to make sure to install that particular version and not the most up-to-date kernel.

sudo yum repo-pkgs lustre-server update kernel-3.10.0-1062.9.1.el7_lustre kernel-devel-3.10.0-1062.9.1.el7_lustre kernel-headers-3.10.0-1062.9.1.el7_lustre

After that, I checked the current list of other updates available in the Lustre server repository.

sudo yum repo-pkgs lustre-server list

Next, I updated all the Lustre packages that were already installed:

sudo yum repo-pkgs lustre-server update kmod-lustre.x86_64 kmod-lustre-osd-ldiskfs.x86_64 libnvpair1.x86_64 libuutil1.x86_64 libzfs2.x86_64 libzpool2.x86_64 lustre.x86_64 lustre-osd-ldiskfs-mount.x86_64 lustre-osd-zfs-mount.x86_64 lustre-resource-agents.x86_64 lustre-zfs-dkms.noarch spl.x86_64 spl-dkms.noarch zfs.x86_64 zfs-dkms.noarch

Finally, I’ll update all the other system software, carefully excluding the Linux kernel packages:

sudo yum -x kernel,kernel-headers,kernel-debug-devel,kernel-tools,kernel-tools-libs,kmod-lustre.x86_64,kmod-lustre-osd-ldiskfs.x86_64,libnvpair1.x86_64,libuutil1.x86_64,libzfs2.x86_64,libzpool2.x86_64,lustre.x86_64,lustre-osd-ldiskfs-mount.x86_64,lustre-osd-zfs-mount.x86_64,lustre-resource-agents.x86_64,lustre-zfs-dkms.noarch,spl.x86_64,spl-dkms.noarch,spl-dkms.noarch,zfs.x86_64,zfs-dkms.noarch,kernel-devel update

That completes all the software upgrades. The same process needs to be done on wyeast-lustre02 and wyeast-lustre03. I probably should have umounted Lustre mounts before this process, but I didn’t. So after the reboot, Lustre wasn’t quite working. I had to fix it.

First, I had to fix the firewall again on the Lustre machines:

sudo iptables -F

Next, zfs (the file system used by Lustre) was messed up on wyeast-lustre01 and wyeast-lustre02.

The command:

zfs list

wasn’t working. It showed that zfs wasn’t loaded. So the first step is to do:

modprobe zfs

This loaded zfs. However, our zfs pools are missing. This command fixed that:

zpool import

This finds the zpools and allows them to be imported:

zpool import lustre-ost0/ost0

zpool import lustre-ost0/ost0

This loads the zfs pools, but I still need to remount the Lustre file system. This needs to be done on the object storage targets first (lustre02 and lustre03) before it is done on the metadata server (lustre01).

sudo mount -t lustre lustre-ost0/ost0 /lustre-ost0/ost0

sudo mount -t lustre lustre-ost1/ost1 /lustre-ost1/ost1

Lustre actually automounted correctly on Lustre03, so I didn’t have to fix anything. With the targets working, it was time to fix Lustre01:

mount -t lustre lustre-mgsmdt/mgsmdt /lustre-mgsmdt/mgsmdt

Mounting the Lustre file system starts the Lustre service and we are off to the races.

Back on the compute nodes, it wasn’t finding the Lustre mount on the head node. So I had to unmount and then remount Lustre.

First, when I tried to unmount Lustre, the file system was reported as busy. So I ran the following command the find the guilty processes:

sudo lsof +f -- /lustre

This gives me a list of processes that I was then able to kill off. After that:

sudo umount /lustre

Followed by:

sudo mount -t lustre 192.168.1.11@tcp:/lustre /lustre

Which worked! Although I hadn’t yet updated the Lustre client, it was still able to handle the updated Lustre server. The other nodes that didn’t have active shells attached to them didn’t have any trouble with the change; I didn’t even have to remount them; the file system just showed up without any trouble.

Next step is to update the software on the compute nodes. Similar process except somewhat easier since we don’t have to deal with zfs. I still want to limit the install to the particular Linux kernel and the “Lustre-client” repo. In this case, I had to download the rpms from rpmfind:

https://rpmfind.net/linux/rpm2html/search.php?query=kernel%28×86-64%29&submit=Search+…&system=&arch=

I downloaded RPMs for kernel, kernel-debug-devel, kernel-headers, kernel-tools, and kernel-tools-libs. This time, I remembered to unmount /lustre first. Then I installed the new kernel modules:

Then, to install them:

sudo yum localinstall kernel-3.10.0-1062.9.1.el7.x86_64.rpm kernel-debug-devel-3.10.0-1062.9.1.el7.x86_64.rpm kernel-headers-3.10.0-1062.9.1.el7.x86_64.rpm kernel-tools-3.10.0-1062.9.1.el7.x86_64.rpm kernel-tools-libs-3.10.0-1062.9.1.el7.x86_64.rpm

Next, update the Lustre client:

sudo yum repo-pkgs lustre-client update kmod-lustre-client.x86_64 lustre-client.x86_64

Then update everything else, excluding the kernel stuff:

sudo yum update -x kernel,kernel-debug-devel,kernel-headers,kernel-tools,kernel-tools-libs

Finally, reboot and then remount Lustre:

sudo mount -t lustre 192.168.1.11@tcp:/lustre /lustre

Unlike with the Lustre server, I didn’t encounter any trouble with the reboot. The Lustre partition survived the update just fine, and I was able to successfully update all the rest of the installed software on the system.

0 thoughts on “Upgrading Lustre

  1. Instagram hashtags are essentially a method of categorizing and labelling your content. They also assist Instagram provide your content to users who are relevant.
    In their simplest form the hashtags you choose to use are the basis for search results on the Explore page of Instagram:

    However, it doesn’t stop there. So, how to post a someone else’s blog post on instagram can also be used as an indicator to Instagram’s algorithm. Instagram algorithm, which means it is able to categorize your content and recommend that it be shown to users it believes is likely to be of interest.
    Then… Are Hashtags still work in 2022 on Instagram by 2022?

    Hashtags have been at the center of debate particularly in light of Instagram’s recent suggestion to use 3 and 5 hashtags (more on this later).
    As Instagram gradually shifts to the semantic search engine, it opens an entirely new realm of possibilities in the search engine’s ability to find content – meaning that the words you use in your captions, or the subjects that you include in your posts will be searchable as well.

    However, despite these significant technological advancements, hashtags still function on Instagram. When paired with a solid content strategy, they could produce amazing results.
    Are you ready to download the complete download of Instagram hashtags? Take a look at our YouTube video guide right now:

Leave a Reply

Your email address will not be published. Required fields are marked *