!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

318 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda63 Servers

Load older messages


SenderMessageTime
19 Apr 2024
@trexd:matrix.orgtrexd
In reply to @ss:someonex.net
With podman and with docker>=25 this, together with the host configuration, should be enough. Disclaimer: the option will be renamed before the release after all....

Hmm for some reason I can't get it to work. Here's how I'm testing it.

$ sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
[sudo] password for collin: 
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

Here are the relevant bits of my config

 23   virtualisation.docker.package = pkgs.docker_25;                                                                                                                      
 24   virtualisation.containers.cdi.dynamic.nvidia.enable = true;
13:32:52
@ss:someonex.netSomeoneSerge (matrix works sometimes)I only tested with podman. Could you test if that still works?13:34:08
@trexd:matrix.orgtrexd
In reply to @ss:someonex.net
I only tested with podman. Could you test if that still works?
Ok that worked
13:37:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)Ok at least the problem is somewhat localized. I think it worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre13:38:46
@ss:someonex.netSomeoneSerge (matrix works sometimes)* Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre13:38:58
@trexd:matrix.orgtrexd
In reply to @ss:someonex.net
Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre
Issue is up.
13:52:07
@tanja-6584:matrix.orgTanja (Old; I'm now @tanja:catgirl.cloud) changed their display name from Tanja to Tanja (Old).14:21:09
@yannham:matrix.orgyannham Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5) 15:02:05
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @yannham:matrix.org
Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5)
GPU-in-the-sandbox tests!
15:05:48
@ss:someonex.netSomeoneSerge (matrix works sometimes)screw it, maybe I should just give up on everything this weekend and work on tests and the ci15:06:49
@connorbaker:matrix.orgconnor (he/him)I mean, do we pay for electricity / bandwidth usage at the office? Electricity shouldn't be much given it's so smol, but it would be nice to have a dedicated tester for Orin15:10:04
@connorbaker:matrix.orgconnor (he/him) Easiest thing would be to just add it to our Hercules CI cluster, I guess... though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation 15:10:53
@yannham:matrix.orgyannhamI'll probably have to ask that formally, but this is probably negligible. The security part is probably more concerning, this is why I prefer to come with a concrete plan 15:11:37
@ss:someonex.netSomeoneSerge (matrix works sometimes)

though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation

one option is to take the ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures

15:11:47
@yannham:matrix.orgyannham(regarding the electricity/bandwith cost)15:11:50
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation

one option is to take ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures

15:11:58
@ss:someonex.netSomeoneSerge (matrix works sometimes)

Hercules CI

Thinking about giving up on that entirely

15:12:21
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
GPU-in-the-sandbox tests!
speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230? Or thoughts on declaring the cuda capability in some structured way? I mean, the capability a binary targets effectively changes what we would think of as a platform because it won't run on other devices
15:13:07
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

Hercules CI

Thinking about giving up on that entirely because of the "effect queue" thing

15:13:10
@ss:someonex.netSomeoneSerge (matrix works sometimes)

speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230?

Yes, I just need to give up a weekend to implement the stuff from the last comment 🥲

15:13:55
@ss:someonex.netSomeoneSerge (matrix works sometimes)It's superembarrassing it's still unmerged15:14:19
@connorbaker:matrix.orgconnor (he/him)Unrelated: is there a difference in the performance we get building with PTX vs targeting an architecture specifically -- or alternatively, any recommended readings about PTX if that question doesn't make sense?15:14:21
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
It's superembarrassing it's still unmerged
It's a really big feature! And something other accelerators will undoubtedly look to for inspiration in the future :)
15:14:51
@ss:someonex.netSomeoneSerge (matrix works sometimes) 1 I'd expect no difference if you've built for the target arch as well
2 I'd expect a jit/aot/whatever it's called expense the first time you run the kernels when you run on a new gpu
15:16:18
@justinrestivo:matrix.orgjustinrestivo joined the room.15:18:28
@ss:someonex.netSomeoneSerge (matrix works sometimes) Lol apptainer.tests.image-saxpy needs more diskSize again. We should fix getOutput asap (the main offender is static cublas obviously) 15:24:56
@ss:someonex.netSomeoneSerge (matrix works sometimes) connor (he/him) (UTC-5) yannham what about hosting a buildbot somewhere instead 15:25:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)At the end of the day the saas nature of hercules is... constraining15:26:21
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
Lol apptainer.tests.image-saxpy needs more diskSize again. We should fix getOutput asap (the main offender is static cublas obviously)
Since I'm mostly done with https://github.com/ConnorBaker/cuda-redist-find-features/tree/feat/recursive-nar-hash-for-FOD and I'll start trying to integrate it into Nixpkgs (allowing us newer manifest versions, among other things), do you want me to take a look at making different derivations for each output, without relying on the multiple-output setup hook?
Actually, do you have a link to the issue for that handy? I think I misplaced it and I can't find it :l
15:36:41
@ss:someonex.netSomeoneSerge (matrix works sometimes)

me to take a look at making different derivations for each output, without relying on the multiple-output setup hook?

That's an interesting idea: reducing the number of inputs surely means fewer rebuilds, at least in principle

16:09:04

Show newer messages


Back to Room ListRoom Version: 9