!sBfrWMVsLoSyFTCkNv:nixos.org

OfBorg

167 Members
Number of builds and evals in queue: <TBD>62 Servers

Load older messages


SenderMessageTime
2 Nov 2021
@oliver:matrix.nrp-nautilus.iooliver left the room.19:24:23
3 Nov 2021
@cole-h:matrix.orgcole-hI'm currently redeploying all the evaluators to run on ZFS, which should hopefully prevent the recent space issues (inode related) from reappearing. If you see any problems, please ping me directly either here or in any related issues / PRs.17:41:06
@piegames:matrix.orgpiegames set a profile picture.22:01:14
@cole-h:matrix.orgcole-h

Just following up: it is done! All the evaluators are running off of ZFS now!
Here's a little more backstory on the issue and why we decided to move the
machines to ZFS:

Originally, the machines were run off of EXT4. For years, it seems like this has
been fine, but recently, we have been running into issues where the machines
would complain about a lack of free disk space. When we went to go check,
however, it wasn't disk space that was the problem, but available inodes!

EXT4 has a limited amount of inodes (I believe it's somewhere around 4
billion?), and while derivations (e.g. the .drv files themeslves) are small,
they each take up an inode (at least). Although the garbage collector does know
how to "free X amount of data", it doesn't know how to make sure it frees "X
amount of inodes". This lead to the disk having plenty of space, but absolutely
no inodes available.

Enter ZFS: ZFS gives you unlimited inodes (in that it doesn't have a fixed
number of them available) so long as you have the disk space to support it.

Then, in order to actually get the machines running ZFS, I had to do a few
things:

  • Use Equinix Metal's new NixOS support to deploy the instances
  • Set up customdata in order to set up the ZFS pool using the disks exposed to
    the instance
  • Deploy the instances one-by-one in order to verify they worked properly
    and wouldn't fall over if left alone

For the moment, things seem to be working just fine! Fingers crossed, no more
running-out-of-space alerts until we're actually running out of space... That
said, I will still keep my eye on it. Once again, if you notice anything out of
the ordinary, don't be a stranger: ping me (here or on the related issue / PR)!

22:10:27
@cole-h:matrix.orgcole-h *

Just following up: it is done! All the evaluators are running off of ZFS now!
Here's a little more backstory on the issue and why we decided to move the
machines to ZFS:

Originally, the machines were run off of EXT4. For years, it seems like this has
been fine, but recently, we have been running into issues where the machines
would complain about a lack of free disk space. When we went to go check,
however, it wasn't disk space that was the problem, but available inodes!

EXT4 has a limited amount of inodes (I believe it's somewhere around 4
billion?), and while derivations (e.g. the .drv files themeslves) are small,
they each take up an inode (at least). Although the garbage collector does know
how to "free X amount of data", it doesn't know how to make sure it frees "X
amount of inodes". This lead to the disk having plenty of space, but absolutely
no inodes available.

Enter ZFS: ZFS gives you unlimited inodes (in that it doesn't have a fixed
number of them available) so long as you have the disk space to support it.

Then, in order to actually get the machines running ZFS, I had to do a few
things:

  • Use Equinix Metal's new NixOS support to deploy the instances
  • Set up customdata in order to set up the ZFS pool using the disks exposed to
    the instance
  • Deploy the instances one-by-one in order to verify they worked properly
    and wouldn't fall over if left alone

For the moment, things seem to be working just fine! Fingers crossed, no more
running-out-of-space alerts until we're actually running out of space... That
said, I will still keep my eye on it. Once again, if you notice anything out of
the ordinary, don't be a stranger: ping me (here or on the related issue / PR)!

22:10:31
@cole-h:matrix.orgcole-h *

Just following up: it is done! All the evaluators are running off of ZFS now!
Here's a little more backstory on the issue and why we decided to move the
machines to ZFS:

Originally, the machines were run off of EXT4. For years, it seems like this has
been fine, but recently, we have been running into issues where the machines
would complain about a lack of free disk space. When we went to go check,
however, it wasn't disk space that was the problem, but available inodes!

EXT4 has a limited amount of inodes (I believe it's somewhere around 4
billion?), and while derivations (e.g. the .drv files themeslves) are small,
they each take up an inode (at least). Although the garbage collector does know
how to "free X amount of data", it doesn't know how to make sure it frees "X
amount of inodes". This lead to the disk having plenty of space, but absolutely
no inodes available.

Enter ZFS: ZFS gives you unlimited inodes (in that it doesn't have a fixed
number of them available) so long as you have the disk space to support it.

Then, in order to actually get the machines running ZFS, I had to do a few
things:

  • Use Equinix Metal's new NixOS support to deploy the instances
  • Set up customdata in order to set up the ZFS pool using the disks exposed to
    the instance
  • Deploy the instances one-by-one in order to verify they worked properly
    and wouldn't fall over if left alone

For the moment, things seem to be working just fine! Fingers crossed, no more
running-out-of-space alerts until we're actually running out of space... That
said, I will still keep my eye on it. Once again, if you notice anything out of
the ordinary, don't be a stranger: ping me (here or on the related issue / PR)!

22:10:40
@cole-h:matrix.orgcole-h...that's a long message22:11:00
@r-burns:matrix.orgRyan Burns joined the room.22:18:25
@jb:vk3.wtfjbedo joined the room.22:50:40
@ryantm:matrix.orgryantm joined the room.23:08:55
@noah:matrix.chatsubo.cafeChurch joined the room.23:14:00
@janne.hess:helsinki-systems.deJanne Heß joined the room.23:32:38
@janne.hess:helsinki-systems.deJanne Heß left the room.23:32:57
4 Nov 2021
@illustris:matrix.orgillustris joined the room.02:36:26
@illustris:matrix.orgillustris left the room.02:37:18
5 Nov 2021
@r-burns:matrix.orgRyan BurnsIt looks like ofborg-eval-lib-tests fails on some PRs which modify stdenv setup.sh, is this a known issue? https://github.com/NixOS/nixpkgs/pull/144749#issuecomment-96202915020:22:36
@r-burns:matrix.orgRyan Burns * It looks like ofborg-eval-lib-tests fails on some PRs which modify stdenv setup.sh, is this a known issue? https://github.com/NixOS/nixpkgs/pull/144749#issuecomment-962029150 20:23:04
6 Nov 2021
@cole-h:matrix.orgcole-hHm, that kinda looks like https://github.com/NixOS/nixpkgs/issues/64459. And it makes sense, seeing as we recently switched the evaluators / build machines to ZFS......00:08:00
@cole-h:matrix.orgcole-hRebuilding coreutils on my ZFS system to see if it appears there as well.00:10:41
@r-burns:matrix.orgRyan BurnsHmm I'd be surprised if that's the cause, my zfs box does stdenv rebuilds for breakfast00:11:30
@cole-h:matrix.orgcole-hCould be a configuration thing.00:11:40
@cole-h:matrix.orgcole-hYeah, rebuilt on my system and nada.00:15:48
@cole-h:matrix.orgcole-h(nada being no failures)00:16:03
@cole-h:matrix.orgcole-hAttempting to build on that machine myself to see if it is, somehow, a transient failure.00:24:54
@cole-h:matrix.orgcole-h...00:28:02
@cole-h:matrix.orgcole-hWell, coreutils built fine. Now to build all of lib-tests.00:32:53
@cole-h:matrix.orgcole-hHm, maybe vcunat is right: https://github.com/NixOS/nixpkgs/pull/142602#issuecomment-96178715400:37:48
@r-burns:matrix.orgRyan BurnsBut this wasn't happening before, right? Or were we just not aware of it00:56:14
@cole-h:matrix.orgcole-hIt's the first I'm hearing of it, at least.00:59:22
@cole-h:matrix.orgcole-hAnd indeed, it succeeded building on that ofborg machine...01:06:05

Show newer messages


Back to Room ListRoom Version: 6