NixOS CUDA - Public Room Timeline

	NixOS CUDA	318 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	63 Servers

Load older messages

Sender	Message	Time
19 Apr 2024
trexd	In reply to @ss:someonex.net With podman and with docker>=25 this, together with the host configuration, should be enough. Disclaimer: the option will be renamed before the release after all.... Hmm for some reason I can't get it to work. Here's how I'm testing it. `$ sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L [sudo] password for collin: docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].` Here are the relevant bits of my config `23 virtualisation.docker.package = pkgs.docker_25; 24 virtualisation.containers.cdi.dynamic.nvidia.enable = true;`	13:32:52
SomeoneSerge (matrix works sometimes)	I only tested with podman. Could you test if that still works?	13:34:08
trexd	In reply to @ss:someonex.net I only tested with podman. Could you test if that still works? Ok that worked	13:37:26
SomeoneSerge (matrix works sometimes)	Ok at least the problem is somewhat localized. I think it worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre	13:38:46
SomeoneSerge (matrix works sometimes)	* Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre	13:38:58
trexd	In reply to @ss:someonex.net Ok at least the problem is somewhat localized. I think it is worth opening a github issue specifically about docker_25 + CDI. Also please ping me and ereslibre Issue is up.	13:52:07
	Tanja (Old; I'm now @tanja:catgirl.cloud) changed their display name from Tanja to Tanja (Old).	14:21:09
yannham	Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5)	15:02:05
SomeoneSerge (matrix works sometimes)	In reply to @yannham:matrix.org Hi folks, we have a Jetson Orin AGX which is sitting idle at work with an additional dedicated external SSD storage. I would like to repurpose it as a CI for Nixpkgs CUDA stuff. I'll have to think a bit about the security side of thing, as it's currently connected to our office's network, but what do you think would be the easiest and the most useful stuff to test there? What do you think would make a good setup? cc connor (he/him) (UTC-5) GPU-in-the-sandbox tests!	15:05:48
SomeoneSerge (matrix works sometimes)	screw it, maybe I should just give up on everything this weekend and work on tests and the ci	15:06:49
connor (he/him)	I mean, do we pay for electricity / bandwidth usage at the office? Electricity shouldn't be much given it's so smol, but it would be nice to have a dedicated tester for Orin	15:10:04
connor (he/him)	Easiest thing would be to just add it to our Hercules CI cluster, I guess... though we'd probably want a way to only run things on it which have `requiredFeatures = [ "cuda-blah" ];` as part of the derivation	15:10:53
yannham	I'll probably have to ask that formally, but this is probably negligible. The security part is probably more concerning, this is why I prefer to come with a concrete plan	15:11:37
SomeoneSerge (matrix works sometimes)	though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation one option is to take the ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures	15:11:47
yannham	(regarding the electricity/bandwith cost)	15:11:50
SomeoneSerge (matrix works sometimes)	* though we'd probably want a way to only run things on it which have requiredFeatures = [ "cuda-blah" ]; as part of the derivation one option is to take ofborg's outPaths expression and make it skip everything that is not a passthru test with requiredSystemFeatures	15:11:58
SomeoneSerge (matrix works sometimes)	Hercules CI Thinking about giving up on that entirely	15:12:21
connor (he/him)	In reply to @ss:someonex.net GPU-in-the-sandbox tests! speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230? Or thoughts on declaring the cuda capability in some structured way? I mean, the capability a binary targets effectively changes what we would think of as a platform because it won't run on other devices	15:13:07
SomeoneSerge (matrix works sometimes)	* Hercules CI Thinking about giving up on that entirely because of the "effect queue" thing	15:13:10
SomeoneSerge (matrix works sometimes)	speaking of which, any updates on https://github.com/NixOS/nixpkgs/pull/256230? Yes, I just need to give up a weekend to implement the stuff from the last comment 🥲	15:13:55
SomeoneSerge (matrix works sometimes)	It's superembarrassing it's still unmerged	15:14:19
connor (he/him)	Unrelated: is there a difference in the performance we get building with PTX vs targeting an architecture specifically -- or alternatively, any recommended readings about PTX if that question doesn't make sense?	15:14:21
connor (he/him)	In reply to @ss:someonex.net It's superembarrassing it's still unmerged It's a really big feature! And something other accelerators will undoubtedly look to for inspiration in the future :)	15:14:51
SomeoneSerge (matrix works sometimes)	1 I'd expect no difference if you've built for the target arch as well 2 I'd expect a jit/aot/whatever it's called expense the first time you run the kernels when you run on a new gpu	15:16:18
	justinrestivo joined the room.	15:18:28
SomeoneSerge (matrix works sometimes)	Lol `apptainer.tests.image-saxpy` needs more `diskSize` again. We should fix getOutput asap (the main offender is static cublas obviously)	15:24:56
SomeoneSerge (matrix works sometimes)	connor (he/him) (UTC-5) yannham what about hosting a buildbot somewhere instead	15:25:26
SomeoneSerge (matrix works sometimes)	At the end of the day the saas nature of hercules is... constraining	15:26:21
connor (he/him)	In reply to @ss:someonex.net Lol `apptainer.tests.image-saxpy` needs more `diskSize` again. We should fix getOutput asap (the main offender is static cublas obviously) Since I'm mostly done with https://github.com/ConnorBaker/cuda-redist-find-features/tree/feat/recursive-nar-hash-for-FOD and I'll start trying to integrate it into Nixpkgs (allowing us newer manifest versions, among other things), do you want me to take a look at making different derivations for each output, without relying on the multiple-output setup hook? Actually, do you have a link to the issue for that handy? I think I misplaced it and I can't find it :l	15:36:41
SomeoneSerge (matrix works sometimes)	me to take a look at making different derivations for each output, without relying on the multiple-output setup hook? That's an interesting idea: reducing the number of inputs surely means fewer rebuilds, at least in principle	16:09:04

Show newer messages

Back to Room ListRoom Version: 9