!kFJOpVCFYFzxqjpJxm:nixos.org

Nix HPC

79 Members
Nix for High Perfomance Computing clusters19 Servers

Load older messages


SenderMessageTime
4 Oct 2023
@pederbs:pvv.ntnu.nopbsds changed their profile picture.22:20:31
8 Oct 2023
@realnyte:matrix.orgrealnyte joined the room.11:53:04
11 Oct 2023
@madouura:matrix.orgMadoura changed their profile picture.05:35:17
23 Oct 2023
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) changed their display name from Someone (UTC+3) to SomeoneSerge (UTC+1).09:09:52
25 Oct 2023
@federicodschonborn:matrix.orgFederico Damián Schonborn changed their profile picture.00:13:14
27 Oct 2023
@federicodschonborn:matrix.orgFederico Damián Schonborn changed their profile picture.01:24:47
29 Oct 2023
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) changed their display name from SomeoneSerge (UTC+1) to SomeoneSerge (UTC+2).22:42:17
3 Nov 2023
@fizihcyst:matrix.orgfizihcyst joined the room.11:50:01
5 Nov 2023
@kotatsuyaki:matrix.kotatsu.devkotatsuyaki joined the room.03:52:10
9 Nov 2023
@bootstrapper:matrix.orgIdo Samuelson changed their display name from snick to Ido Samuelson.06:33:43
10 Nov 2023
@globin:toznenetl.chatglobin joined the room.00:49:30
15 Nov 2023
@grahamc:nixos.org@grahamc:nixos.orgchanged room power levels.16:08:35
@grahamc:nixos.org@grahamc:nixos.org left the room.16:08:36
@mjolnir:nixos.orgNixOS Moderation Botchanged room power levels.18:12:37
@mjolnir:nixos.orgNixOS Moderation Botchanged room power levels.18:12:37
19 Nov 2023
@pederbs:pvv.ntnu.nopbsds changed their display name from pbsds to pbsds (federation borken, may not see reply).03:36:14
@zxgu:matrix.orgZXGU joined the room.10:59:25
@pederbs:pvv.ntnu.nopbsds changed their display name from pbsds (federation borken, may not see reply) to pbsds.20:39:13
21 Nov 2023
@hdzki:hdzki.kozow.comhdzki ⚡️ joined the room.18:23:55
29 Nov 2023
@jcie74:matrix.orgpie_I'm slightly procrastinating. What's up in HPC these days?01:34:33
3 Dec 2023
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) ShamrockLee (Yueh-Shun Li)jbedo do you know any downsides to this VM-free approach to assembling singularity images? https://github.com/NixOS/nixpkgs/issues/177908#issuecomment-1495625986 14:16:57
@ss:someonex.netSomeoneSerge (hash-versioned python modules when)

Finally (lol) got around to trying Nix-built singularity images on the cluster:

❯ nom build .#pkgsCuda.some-pkgs-py.edm.image -L
❯ du -hs $(readlink ./result)
3.4G    /nix/store/axhpdc96qgzk720yciwfach93f3xrqby-singularity-image-edm.img
❯ du -hs /scratch/cs/graphics/singularity-images/images/edm.sif  # Baseline based off NGC
7.6G    /scratch/cs/graphics/singularity-images/images/edm.sif
❯ rsync -LP ./result triton:
❯ ssh triton srun --mem=8G --time=0:05:00 --gres=gpu:a100:1 singularity exec -B /m:/m -B /scratch:/scratch -B /l:/l --nv ./result nixglhost -- python -m edm.example
...
Saving image grid to "imagenet-64x64.png"...
Done
17:33:47
@ss:someonex.netSomeoneSerge (hash-versioned python modules when)(nccl and mpi tests on the way)17:33:53
4 Dec 2023
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) ShamrockLee (Yueh-Shun Li) btw the squashfs compression is behaving oddly: I went over some threshold maybe and now a 5.7GiB buildEnv maps into an 11GiB sif 10:10:34
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) AJAJAJA all you need is ram: I built the same image with memSize = 20 * 1024 instead of 4 * 1024, and now the final result is like 2.8G... 11:08:56
5 Dec 2023
@federicodschonborn:matrix.orgFederico Damián Schonborn changed their profile picture.00:38:43
@jb:vk3.wtf@jb:vk3.wtf
In reply to @ss:someonex.net
ShamrockLee (Yueh-Shun Li)jbedo do you know any downsides to this VM-free approach to assembling singularity images? https://github.com/NixOS/nixpkgs/issues/177908#issuecomment-1495625986
not aware of any downsides, it's a pretty nice approach
01:02:55
11 Dec 2023
@markuskowa:matrix.orgmarkuskowa SomeoneSerge (UTC+2): I'm pretty busy at the moment. I will look at the slurm PR on the weekend. 08:48:56
13 Dec 2023
@ss:someonex.netSomeoneSerge (hash-versioned python modules when)

https://gist.github.com/SomeoneSerge/3f894ffb5f97e55a0a5cfc10dfbc66e1#file-slurm-nix-L45-L61

Does it make sense that this consistently blocks after the following message?

vm-test-run-slurm> submit # WARNING: Open MPI accepted a TCP connection from what appears to be a
vm-test-run-slurm> submit # another Open MPI process but cannot find a corresponding process
vm-test-run-slurm> submit # entry for that peer.
vm-test-run-slurm> submit # 
vm-test-run-slurm> submit # This attempted connection will be ignored; your MPI job may or may not
vm-test-run-slurm> submit # continue properly.
vm-test-run-slurm> submit # 
vm-test-run-slurm> submit #   Local host: node1
vm-test-run-slurm> submit #   PID:        831
01:12:13
@ss:someonex.netSomeoneSerge (hash-versioned python modules when) *

https://gist.github.com/SomeoneSerge/3f894ffb5f97e55a0a5cfc10dfbc66e1#file-slurm-nix-L45-L61

Does it make sense that this consistently blocks after the following message?

vm-test-run-slurm> submit # WARNING: Open MPI accepted a TCP connection from what appears to be a
vm-test-run-slurm> submit # another Open MPI process but cannot find a corresponding process
vm-test-run-slurm> submit # entry for that peer.
vm-test-run-slurm> submit # 
vm-test-run-slurm> submit # This attempted connection will be ignored; your MPI job may or may not
vm-test-run-slurm> submit # continue properly.
vm-test-run-slurm> submit # 
vm-test-run-slurm> submit #   Local host: node1
vm-test-run-slurm> submit #   PID:        831

Also the same test with final: prev: { mpi = final.mpich; } reports the wrong world size of 1

02:27:48

Show newer messages


Back to Room ListRoom Version: 9