| 19 Apr 2024 |
trexd | In reply to @ss:someonex.net With podman and with docker>=25 this, together with the host configuration, should be enough. Disclaimer: the option will be renamed before the release after all.... Hmm for some reason I can't get it to work. Here's how I'm testing it.
$ sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
[sudo] password for collin:
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].
Here are the relevant bits of my config
23 virtualisation.docker.package = pkgs.docker_25;
24 virtualisation.containers.cdi.dynamic.nvidia.enable = true;
| 13:32:52 |
SomeoneSerge (matrix works sometimes) | I only tested with podman. Could you test if that still works? | 13:34:08 |
trexd | In reply to @ss:someonex.net I only tested with podman. Could you test if that still works? Ok that worked | 13:37:26 |
SomeoneSerge (matrix works sometimes) | Ok at least the problem is somewhat localized. I think it worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre | 13:38:46 |
SomeoneSerge (matrix works sometimes) | * Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre | 13:38:58 |
trexd | In reply to @ss:someonex.net Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre Issue is up. | 13:52:07 |
| Tanja (Old; I'm now @tanja:catgirl.cloud) changed their display name from Tanja to Tanja (Old). | 14:21:09 |
yannham | Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5) | 15:02:05 |
SomeoneSerge (matrix works sometimes) | In reply to @yannham:matrix.org Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5) GPU-in-the-sandbox tests! | 15:05:48 |
SomeoneSerge (matrix works sometimes) | screw it, maybe I should just give up on everything this weekend and work on tests and the ci | 15:06:49 |
connor (he/him) | I mean, do we pay for electricity / bandwidth usage at the office? Electricity shouldn't be much given it's so smol, but it would be nice to have a dedicated tester for Orin | 15:10:04 |
connor (he/him) | Easiest thing would be to just add it to our Hercules CI cluster, I guess... though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation | 15:10:53 |
yannham | I'll probably have to ask that formally, but this is probably negligible. The security part is probably more concerning, this is why I prefer to come with a concrete plan | 15:11:37 |
SomeoneSerge (matrix works sometimes) |
though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation
one option is to take the ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures | 15:11:47 |
yannham | (regarding the electricity/bandwith cost) | 15:11:50 |
SomeoneSerge (matrix works sometimes) | *
though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation
one option is to take ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures | 15:11:58 |
SomeoneSerge (matrix works sometimes) |
Hercules CI
Thinking about giving up on that entirely | 15:12:21 |
connor (he/him) | In reply to @ss:someonex.net GPU-in-the-sandbox tests! speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230? Or thoughts on declaring the cuda capability in some structured way? I mean, the capability a binary targets effectively changes what we would think of as a platform because it won't run on other devices | 15:13:07 |
SomeoneSerge (matrix works sometimes) | *
Hercules CI
Thinking about giving up on that entirely because of the "effect queue" thing | 15:13:10 |
SomeoneSerge (matrix works sometimes) |
speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230?
Yes, I just need to give up a weekend to implement the stuff from the last comment 🥲 | 15:13:55 |
SomeoneSerge (matrix works sometimes) | It's superembarrassing it's still unmerged | 15:14:19 |
connor (he/him) | Unrelated: is there a difference in the performance we get building with PTX vs targeting an architecture specifically -- or alternatively, any recommended readings about PTX if that question doesn't make sense? | 15:14:21 |
connor (he/him) | In reply to @ss:someonex.net It's superembarrassing it's still unmerged It's a really big feature! And something other accelerators will undoubtedly look to for inspiration in the future :) | 15:14:51 |
SomeoneSerge (matrix works sometimes) | 1 I'd expect no difference if you've built for the target arch as well
2 I'd expect a jit/aot/whatever it's called expense the first time you run the kernels when you run on a new gpu | 15:16:18 |
| justinrestivo joined the room. | 15:18:28 |
SomeoneSerge (matrix works sometimes) | Lol apptainer.tests.image-saxpy needs more diskSize again. We should fix getOutput asap (the main offender is static cublas obviously) | 15:24:56 |
SomeoneSerge (matrix works sometimes) | connor (he/him) (UTC-5) yannham what about hosting a buildbot somewhere instead | 15:25:26 |
SomeoneSerge (matrix works sometimes) | At the end of the day the saas nature of hercules is... constraining | 15:26:21 |
connor (he/him) | In reply to @ss:someonex.net Lol apptainer.tests.image-saxpy needs more diskSize again. We should fix getOutput asap (the main offender is static cublas obviously) Since I'm mostly done with https://github.com/ConnorBaker/cuda-redist-find-features/tree/feat/recursive-nar-hash-for-FOD and I'll start trying to integrate it into Nixpkgs (allowing us newer manifest versions, among other things), do you want me to take a look at making different derivations for each output, without relying on the multiple-output setup hook? Actually, do you have a link to the issue for that handy? I think I misplaced it and I can't find it :l | 15:36:41 |
SomeoneSerge (matrix works sometimes) |
me to take a look at making different derivations for each output, without relying on the multiple-output setup hook?
That's an interesting idea: reducing the number of inputs surely means fewer rebuilds, at least in principle | 16:09:04 |