| 18 Nov 2024 |
connor (he/him) | In reply to @ss:someonex.net True. I'm still yet to read up on how SLURM and friends do this. Shameless plug: https://github.com/sinanmohd/evanix (slides) Woah! Thanks for the links, I wasn't aware of these | 20:17:47 |
| 19 Nov 2024 |
hexa | python-updates with numpy 2.1 has landed in staging | 00:31:36 |
hexa | sowwy | 00:31:40 |
connor (he/him) | In reply to @ss:someonex.net Should just work, what is the error? Curl threw connection refused or something similar; I’ll try to get the log tomorrow | 06:34:11 |
| 20 Nov 2024 |
| Conroy joined the room. | 04:47:44 |
connor (he/him) | I did not get a chance; rip | 07:22:37 |
| Daniel joined the room. | 18:53:01 |
| 22 Nov 2024 |
| deng23fdsafgea joined the room. | 06:27:37 |
| Morgan (@numinit) joined the room. | 17:52:10 |
| 24 Nov 2024 |
sielicki | https://negativo17.org/nvidia-driver/ pretty good read | 21:49:05 |
sielicki | most of this is stuff that nixos gets right, but it's a nice collection of gotchas and solutions | 22:01:49 |
sielicki | anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency. | 22:16:05 |
SomeoneSerge (back on matrix) | In reply to @sielicki:matrix.org anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency. iirc we put it in there because if you set tensorflow = ...callPackage ... { cudaPackages = cudaPackages_XX_y; } you'll need to also pass a compatible nccl | 22:17:33 |
SomeoneSerge (back on matrix) | so it's just easier to instantiate each cudaPackages variant with its own nccl and pass it along | 22:17:55 |
sielicki | I guess that's fair, and there is a pretty strong coupling of cuda versions and nccl versions... eg: https://github.com/pytorch/pytorch/pull/133593 has been stalled for some time due to nvidia dropping the pypi cu11 package for nccl, so there's reason to keep them consistent even if they technically release separately. | 22:20:12 |
SomeoneSerge (back on matrix) | In reply to @sielicki:matrix.org https://negativo17.org/nvidia-driver/ pretty good read Any highlights, what we might be missing? | 22:22:09 |
sielicki | honestly I am not sure there's anything, I just like the thought that went into it | 22:27:21 |
sielicki | the special softdep for nvidia-uvm etc | 22:27:48 |
SomeoneSerge (back on matrix) | In reply to @sielicki:matrix.org the special softdep for nvidia-uvm etc yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore | 22:28:24 |
SomeoneSerge (back on matrix) | In reply to @sielicki:matrix.org the special softdep for nvidia-uvm etc * yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore (not sure what the exact situation is) | 22:29:12 |
| 25 Nov 2024 |
sielicki | is this useful? https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa | 04:31:08 |
sielicki | example of its output: https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa/24c08bb29f1397c7d006b01f7afddd5cb06e90a5 | 04:31:38 |
connor (he/him) | You can see what I eventually hope to move in-tree here: https://github.com/ConnorBaker/cuda-packages
Here’s the update script I’ve made for the different redists: https://github.com/ConnorBaker/cuda-packages/tree/main/scripts/cuda-redist | 07:01:12 |
connor (he/him) | Ugh we should write an update for the post Tom made on discourse (https://discourse.nixos.org/t/community-team-updates/56458)
@someoneserge anything we should mention in particular?
I think I started a draft for an update earlier this year so I’ll see if I can find it :/ | 07:03:39 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org Ugh we should write an update for the post Tom made on discourse (https://discourse.nixos.org/t/community-team-updates/56458)
@someoneserge anything we should mention in particular?
I think I started a draft for an update earlier this year so I’ll see if I can find it :/ Let's make a shared pad for the draft? | 14:18:46 |
SomeoneSerge (back on matrix) | Also maybe we've already reached the point where a room-wide voice call could be a better way to list the "challenges" | 14:41:51 |
hexa | is anyone here aware why tensorflow 2.13.0 on 24.11 now requires AVX CPU instructions and the same version on 24.05 did not? https://github.com/NixOS/nixpkgs/issues/358973 | 17:49:22 |
hexa | yes, the answer is dependencies, likely protobuf from the call trace | 17:49:39 |
hexa | * the likely answer is dependencies, should be protobuf from the call trace | 17:49:58 |
SomeoneSerge (back on matrix) | I was about to post "you sure it's not the pypi garbage" and then thought "I'm surely just being biased" | 18:07:06 |