!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

306 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda61 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
13 Feb 2025
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Just pasting the last of them now15:30:46
@ss:someonex.netSomeoneSerge (matrix works sometimes)

Regarding scheduling the future meetings,

  • we should probably aim to meet in 2-4 weeks to follow up on the patchelf exception and for a report on the ephemeral builders situation;
  • we can probably first bring up the alignment questions with nix-community just in their chat, without video because async is faster;
  • additionally, I think I should have hours this and next week to sort the backlog as mentioned in the notes; I think it'd still be useful, for onboarding new people, to do that with the audio and the screenshare, but it's not worth synchronizing people's schedules for this; maybe it'll be just a pop-in format?
15:38:04
@ss:someonex.netSomeoneSerge (matrix works sometimes)(jaja, maybe we do this in Gaetan's twitch?)15:38:26
@glepage:matrix.orgGaétan LepageSure haha15:46:34
14 Feb 2025
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)As of a few days ago Onnxruntime requires CUDA separable compilation… so I guess I gotta fix that now 🙃01:50:24
@ss:someonex.netSomeoneSerge (matrix works sometimes)

RE: CI infra/yesterday's meeting
CC connor (he/him) (UTC-8):

By the way, while on my side I'm advertising both options for provisioning hardware, the spot instances and the owned hardware, I think we might want to incentivize companies to commit to support the latter path. While it's obviously more work, organisational and engineering, it is a much better long-term promise for the community. With the rented hardware, if two or three companies simultaneously decide to withdraw, we basically have to immediately scale down the CI. If we buy hardware for a non-profit and a few years later some companies decide they're not interested anymore, we maybe lose a retainer covering the maintenance work. With own hardware we can also be more flexible and maybe dedicate some machines to be used as community builders/devboxes for ad hoc experimentation.

11:15:16
@zopieux:matrix.zopi.euzopieux

It's me again :-) This time I have a genuinely surprising behavior from the community cache (the substituters are correctly configured): nccl was successfully built (derivation mv02…), the narinfo is available, but upon nix-shell -p cudaPackages_12.nccl I get

this derivation will be built:
  /nix/store/mv02rgvrhw9n1682dw7vs8w3pssc24lr-nccl-2.21.5-1.drv
(lots of compiling)

Others, like cudaPackages.cudnn, are successfully retrieved from the cache.

17:58:45
@ruroruro:matrix.orgruro

So, uh... I just noticed that CUDA versions prior to 11.4 don't have the individual redistributables (for example, there is no cudaPackages_11_3.cuda_cudart).

Unfortunately, I only noticed this after refactoring cuda-samples to use the individual packages instead of cudatoolkit. sigh

21:12:48

Show newer messages


Back to Room ListRoom Version: 9