!atvIbxHoEqNcAIxYpN:nixos.org

NixOS AWS

64 Members
16 Servers

Load older messages


SenderMessageTime
19 Nov 2024
@commiterate:matrix.orgcommiterateunfortunately I'm not all that familiar with more systems level stuff in Linux, so not sure17:43:40
@commiterate:matrix.orgcommiterate I assume systemd udev tags and systemctl start/stop don't mutate /etc like systemctl enable/disable does. 17:53:03
@commiterate:matrix.orgcommiterateAlso it looks like unit removal on device hot-detach works as well. You just need to tag the remove udev rule with the systemd tag. https://github.com/systemd/systemd/issues/7587#issuecomment-60549746518:12:48
@arianvp:matrix.orgArianYeh and stopping the systemd unit can be done with BindsTo=blah.device if I recall correctly 18:26:01
@arianvp:matrix.orgArian

i.e. idiomatic would be:

SUBSYSTEM=="net", ACTION=="add", ENV{ID_NET_DRIVER}=="vif|ena|ixgbevf", SYSTEMD_WANTS="policy-routes@.service refresh-policy-routes@.timer"

And then add

[Unit]
Description=Set up policy routes for %I
BindsTo=%i.device
After=%i.device

The script would then get the full sysfs path as an argument (e.g. /sys/devices/pci0000:00/0000:00:05.0/net/ens5) instead of just the device name so the scripts need to be adjusted slightly

18:47:27
@arianvp:matrix.orgArianBUT by the way18:48:26
@arianvp:matrix.orgArian systemctl enable --now works fine on NixOS if the units have no WantedBy in the [Install] section 18:48:40
@arianvp:matrix.orgArian it is just an alias for systemctl start in that case 18:48:47
@arianvp:matrix.orgArianso I actually think these scripts (Even though theyr'e ugly) will just work18:49:16
@arianvp:matrix.orgArianThere is nothing to patdch18:49:21
@commiterate:matrix.orgcommiterateI see, I'll swap it out of draft then and put out a call for a maintainer.19:13:35
@arianvp:matrix.orgArianNeed to double check it. But I think it's the case 19:24:41
@commiterate:matrix.orgcommiterateOpened an issue upstream to request a change to systemd device units. https://github.com/amazonlinux/amazon-ec2-net-utils/issues/11219:37:45
20 Nov 2024
@inayet:matrix.orgInayet removed their profile picture.00:59:37
21 Nov 2024
@commiterate:matrix.orgcommiterateCloudWatch Agent is finally in Nixpkgs: https://github.com/NixOS/nixpkgs/pull/337212 Tracker: https://nixpk.gs/pr-tracker.html?pr=33721217:32:28
@arianvp:matrix.orgArian
In reply to @commiterate:matrix.org

CloudWatch Agent is finally in Nixpkgs: https://github.com/NixOS/nixpkgs/pull/337212

Tracker: https://nixpk.gs/pr-tracker.html?pr=337212

Nice. we can backport this to 24.11 given it's a new package
18:23:52
23 Nov 2024
@commiterate:matrix.orgcommiterateHmm I might need to update it to let people specify paths to the configuration files. That way people can write their own systemd oneshots which dynamically generate a file at runtime during boot (e.g. getting information from IMDS, SSM Parameter Store, Secrets Manager) instead of having to make 1 VM image per configuration (especially since each VM image is several GBs).19:08:09
@commiterate:matrix.orgcommiteratePR: https://github.com/NixOS/nixpkgs/pull/35855920:18:23
@commiterate:matrix.orgcommiterate * Hmm I might need to update it to let people specify paths to the configuration files. That way people can write their own systemd oneshots which dynamically generate a file at runtime during boot (e.g. getting information from IMDS, SSM Parameter Store, Secrets Manager) instead of having to make 1 VM image per configuration permutation (especially since each VM image is several GBs).20:18:54
24 Nov 2024
@commiterate:matrix.orgcommiterateHmm actually has a bug since I can't extract the desired run as user at build time.19:26:40
@commiterate:matrix.orgcommiterate Fixed, though it means agent.run_as_user in the configuration file is no longer respected (i.e. can't change the user at runtime with a CW config file change) which is fine IMO. 20:47:27
25 Nov 2024
@commiterate:matrix.orgcommiterate

Arian Any concerns with this Fluent Bit module before I try upstreaming it?

https://github.com/commiterate/nix-fluent-bit

Probably going to use it despite the CW Agent work due to the native systemd-journald support and better processing features. That and I'm a bit hesitant now that I've seen the spaghetti under the hood.

06:11:20
@commiterate:matrix.orgcommiterate *

Arian Any concerns with this Fluent Bit module before I try adding it to Nixpkgs?

https://github.com/commiterate/nix-fluent-bit

Probably going to use it despite the CW Agent work due to the native systemd-journald support and better processing features. That and I'm a bit hesitant now that I've seen the spaghetti under the hood.

06:11:31
1 Dec 2024
@sielicki:matrix.org@sielicki:matrix.org

fyi, working on a handful of changes related to AWS and ML:

  1. Adding the efa kernel module: https://github.com/NixOS/nixpkgs/pull/360347

  2. Adding efa-nv-peermem: https://github.com/NixOS/nixpkgs/pull/360375

  3. Adding an updateScript for the out-of-tree ena build and a package bump: https://github.com/NixOS/nixpkgs/pull/360326

I expect a few others before the weekend is over:

  • modifying the libfabric drv to support building with efa and HMEM_CUDA

  • adding and building libnccl-ofi, plus extending nccl so that it uses it

with all of these in place (minus the ENA part which is independent) it should be possible to support multinode ML training on aws with nixos.

02:31:59
@sielicki:matrix.org@sielicki:matrix.org

Arian: any ideas on how to expose this in a module and enable it?

EFA supported instances types are here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types

efa-nv-peermem and the nccl/libfabric stuff is only really needed on p3/p4.*/p5.*/p5e.*/p5en.*

02:35:40
@sielicki:matrix.org@sielicki:matrix.orgthere's a separate discussion worth having about neuron kmods and software support02:36:26
@arianvp:matrix.orgArianGiven its a kernel module do we need an option? Cant we just add it to the image and have udev load it when needed?07:47:06
4 Dec 2024
@arianvp:matrix.orgAriannah this looks pretty good. We could perhaps add more structured module types 10:26:18
@arianvp:matrix.orgArianhttps://github.com/arianvp/nixos-village/blob/main/nix/modules/fluent-bit.nix10:26:43
@arianvp:matrix.orgArian by using freeformType = 10:26:50

Show newer messages


Back to Room ListRoom Version: 10