| 15 Dec 2025 |
hexa (UTC+1) | ❯ curl https://cache.nixos-cuda.org/mhf691zwwjrqi8b6an14pblyqbzwn1v2.narinfo
missed hash⏎
| 02:55:27 |
pdealbera | Thanks! Not the same thing, I can't reach the host:
❯ curl https://cache.nixos-cuda.org/mhf691zwwjrqi8b6an14pblyqbzwn1v2.narinfo
curl: (7) Failed to connect to cache.nixos-cuda.org port 443 after 675 ms: Could not connect to server
| 02:59:52 |
pdealbera | But that means its probably a thing on my end. | 03:00:06 |
hexa (UTC+1) | the server is hosted in helsinki at hetzner fwiw | 03:01:23 |
connor (burnt/out) (UTC-8) | Slightly off topic but for those of you who use Hydra or nix-eval-jobs with lots of eval time fetchers or substitution, you may be interested in some WIP I’ve been doing to improve that use case https://gist.github.com/ConnorBaker/9e31d3b08ff6d4ac841928412131fe15 | 09:42:32 |
connor (burnt/out) (UTC-8) |  Download Numbers from doing a shallow eval (not forcing recursion) of Haskell.nix’s hydraJobs which has a number of flake inputs (and I think also does IFD?) | 09:46:39 |
connor (burnt/out) (UTC-8) |  Download I’m also trying to look into using Intel VTune to get a better idea of Nix bottlenecks/areas for improvement
VTune is currently packaged in Nixpkgs through the Intel-oneapi stuff but I couldn’t get it working without using the latest version. I’ll probably try upstreaming the changes at some point unless someone beats me to it. | 09:48:44 |
yorik.sar | Did you by any chance run a comparison for more common use-case of evaluating a sizeable NixOS config, for example? Just to see what those locks do to less parallel workload. | 10:53:30 |
yorik.sar | I’m surprised to see parser there - how much code were you evaluating? | 10:54:09 |
yorik.sar | I think I already saw some lock implementation in Nix code, probably better to reuse that one. Also, Nix code seems to prefer RAII (smth like { auto _thelock = lock.get(); … }) rather than passing continuation to a function (withLock(…)). | 10:56:46 |
yorik.sar |
I'd like to do further work to deduplicate queries for .narinfo and the like, since Nix already generates quite the network storm by firing them off in serial.
I wonder if Nix uses HTTP/2 there. I think with stream multiplexing, all requests could essentially fit in one pack of packets.
| 10:59:07 |
yorik.sar | *
I'd like to do further work to deduplicate queries for .narinfo and the like, since Nix already generates quite the network storm by firing them off in serial.
I wonder if Nix uses HTTP/2 there. I think with stream multiplexing, all requests could essentially fit in one pack of packets.
| 10:59:14 |
Ari Lotter | okay - i think i just figured out why cudnn/torch/nvrtc is broken.. cudnn does seem to require NVRTC at runtime - see https://docs.nvidia.com/deeplearning/cudnn/backend/latest/api/cudnn-graph-library.html for CUDNN_STATUS_NOT_SUPPORTED_RUNTIME_PREREQUISITE_MISSING -
A runtime library required by cuDNN cannot be found in the predefined search paths. These libraries are libcuda.so (nvcuda.dll) and libnvrtc.so
but it looks like nvrtc is not provided to the cudnn package in nixpkgs!!! https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/cuda-modules/packages/cudnn.nix
so, unless i'm missing something, don't we just need to include nvrtc in buildInputs for cudnn, and this will fix the weird auto-runpath thing..?
| 17:32:01 |
Ari Lotter | (didn't wanna dump this in the gh issue in case im totally mistaken heh) | 17:36:42 |
Ari Lotter | er or maybe propagatedBuildInputs | 17:37:54 |
Ari Lotter | oh no it's way worse lol | 20:06:25 |
Ari Lotter | because dlopen doesn't check the runpath of libtorch_cuda.so does it | 20:06:36 |
Ari Lotter | since dlopen ignores the caller's runpath | 20:06:48 |
Ari Lotter | it only checks the runpath of the main executable.. | 20:06:55 |
Ari Lotter | ok i have no idea what im doing <3 giving up | 20:11:43 |
Ari Lotter | have managed to make it work by setting LD_LIBRARY_PATH but uhh seems pretty.. unusable | 20:13:36 |
Ari Lotter | i think we'd have to patch the dlopen call | 20:13:40 |
Ari Lotter | inside cudnn..? | 20:13:45 |
Ari Lotter | https://github.com/NixOS/nixpkgs/issues/461334#issuecomment-3657416223 | 20:19:51 |
Ari Lotter | i think it's pretty hopeless. lol | 20:19:58 |
connor (burnt/out) (UTC-8) | I’ve not; I suspect zero speedup since evaluating a single config doesn’t have opportunities for parallelism and all the locking is done through kernel structures so it should be super low overhead
(Also Nixpkgs basically forbids eval time fetchers so I don’t think a config using just Nixpkgs would show a speedup) | 20:39:18 |
connor (burnt/out) (UTC-8) | That was evaluating all of the closures attribute in nixos/release.nix with nix eval --json | 20:40:10 |
connor (burnt/out) (UTC-8) | There is some, but it blocks forever, doesn’t do cleanup, and lacks a few other features needed for different types of builtin fetchers and substitutions | 20:41:22 |
connor (burnt/out) (UTC-8) | Robbie Buxton have you run into this (or seen anyone run into it)? You’re one of the like three people I know who use PyTorch from Nixpkgs other than myself | 20:42:49 |
Robbie Buxton | In reply to @connorbaker:matrix.org Robbie Buxton have you run into this (or seen anyone run into it)? You’re one of the like three people I know who use PyTorch from Nixpkgs other than myself I haven’t run into that but I have run into libcuda failing on trying to dlopen things it claims to not depend on | 21:07:52 |