21 Feb 2023 |
Carl Thomé | Nice! I particularly enjoyed the framing of reproducible builds vs reproducible environments as:
"However, containers only “ensure reproducibility for computers … not for humans.” Code isn’t only or even primarily meant to communicate with computers, but with humans as well." | 21:26:29 |
22 Feb 2023 |
John ✒️ | If you don't know it already, I highly recommend everything Konrad Hinsen writes about this kind of stuff. | 08:04:36 |
josw | John ✒️: Great! Would you be willing to post this also on Discourse.nixos.org? | 08:21:54 |
John ✒️ | good idea, I just (finally) signed up. I don't think I'm allowed to post links yet, but hopefully soon. | 15:45:27 |
John ✒️ | https://discourse.nixos.org/t/using-the-nix-package-manager-on-an-hpc-cluster/25699?u=jboy | 20:13:13 |
23 Feb 2023 |
| a-kenji joined the room. | 16:36:06 |
25 Feb 2023 |
| amardeeps joined the room. | 05:48:56 |
28 Feb 2023 |
| apache8080 joined the room. | 22:21:13 |
apache8080 | is there a good nix shell example to get libtorch with cuda working? I have config.cudaSupport = true set in my nixpkgs and it seems like the libtorch being pulled in is the one with cuda support but when I run an example .cpp file it doesn't seem to find GPU support | 23:09:26 |
apache8080 | Not sure if I'm missing some LD_LIBRARY_PATH stuff | 23:09:40 |
apache8080 | https://pastebin.com/iBiJZBQR | 23:10:54 |
apache8080 | here is the flake.nix I am using | 23:11:01 |
apache8080 | https://github.com/metobom/tchrs-opencv-webcam-inference
this is really the thing I'm trying to get running, was using a simple cpp example as a sanity check for now | 23:14:03 |
SomeoneSerge (migrating synapse) | Note: libtorch-bin just wraps a prebuilt libtorch, and patches it to fix dynamic linkage errors. Personally, I just use python3Packages.torch.dev (together with cuda-maintainers.cachix.org), but maybe libtorch-bin will work for you
That aside, the issue is likely in locating libcuda.so . Are you running NixOS?
| 23:17:52 |
apache8080 | yeah I am running NixOS so libcuda.so should be in /run/opengl-drivers/lib | 23:18:56 |
apache8080 | let me try using the python package | 23:19:03 |
SomeoneSerge (migrating synapse) | In reply to @apache8080:matrix.org yeah I am running NixOS so libcuda.so should be in /run/opengl-drivers/lib Does adding that to LD_LIBRARY_PATH help? | 23:20:04 |
apache8080 | no difference | 23:21:46 |
apache8080 | Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend.
the rust program keeps failing with this which seems to indicate that it is not using the cuda version of libtorch
| 23:22:20 |
apache8080 | going to use the python package next | 23:22:32 |
apache8080 | hmm no luck with using the python package | 23:28:08 |
SomeoneSerge (migrating synapse) | I just checked, .dev only contains include/ there | 23:28:20 |
apache8080 | I just ran ldd on my rust binary and it looks like it is only linking libtorch_cpu.so so this may be related | 23:30:23 |
SomeoneSerge (migrating synapse) | I'd look into tch-rs 's discovery logic, maybe | 23:31:37 |
apache8080 | good point, it may not like the include and lib being split that the nixpkg currently does | 23:33:04 |
apache8080 | ok it is working now | 23:38:15 |
apache8080 | that was the issue | 23:38:18 |
apache8080 | Had to combine python3Packages.torch.dev and python3Packages.torch.lib into a single path | 23:38:51 |
SomeoneSerge (migrating synapse) | tch-rs README also mentions one can use LIBTORCH_INCLUDE + LIBTORCH_LIB instead of LIBTORCH | 23:39:50 |
apache8080 | ah I'm blind, thanks for pointing that out lol | 23:42:21 |