| 12 Jun 2024 |
trexd | In reply to @connorbaker:matrix.org Depends on how much space you need. Cachix gives 5Gb for free from what I remember. For a single project, that may be enough.
For anything larger than that, you’re in for a world of unpleasantness depending on storage requirements.
Even just hosting a binary cache via S3, you’re going to be paying for each API call (of which Nix will generate many, like an HTTP HEAD request for each NARINFO file, not to mention the actual retrieval of the data) as well as egress, which adds up very quickly. I figured time would be the more important factor rather than storage? 🤔 I don't imagine the package taking more than 5GB. Is there a nix command that I can use to get the total size of a build? | 12:34:38 |
connor (burnt/out) (UTC-8) | nix path-info -rsSh <flake attribute or store path> iirc | 12:36:40 |
trexd | 7.9G RIP 🫠 | 12:44:12 |
connor (burnt/out) (UTC-8) | to be fair IIRC that's the uncompressed size, and if there are paths cached in the main nixos cache, it wouldn't count against you
not sure Domen allows specifying other upstream caches though (avoiding caching some of the CUDA dependencies would probably be best) | 13:41:29 |
trexd | I guess I'll play around with it and see what's up. Is pinning to a successful build of nixpkgs-cuda-ci still the best way to get CUDA hits? | 13:43:59 |
connor (burnt/out) (UTC-8) | I believe so, yes | 15:00:37 |
SomeoneSerge (matrix works sometimes) | In reply to @aidalgol:matrix.org SomeoneSerge (UTC+3): Any thoughts? https://github.com/NixOS/nixpkgs/issues/319167 I suppose we need some kind of a fixpoint 🤷
Btw, I just had a look at the mangohud derivation, and
- we're still doing
inherit (linuxPackages.nvidia_x11) libXNVCtrl which I think is a mauvais ton (referencing a concrete version of linuxPackages from nixpkgs)
- I still think it belongs in the top-level, maybe as an attrset,
libXNVCtrlVersions
- probably a very bad idea, but the nvidia nixos module could add an overlay setting the respective default version of libXNVCtrl
- all packages taking
libXNVCtrl from the top-level, we'd ensure only one version is in use in any given closure (probably at the cost of rebuilding reverse dependencies)
- passing a python package (mako) rather than python3Packages
Honestly, not sure if this is worth the effort 🙃 | 16:24:04 |
| 13 Jun 2024 |
| shekhinah removed their display name yaldabaoth. | 02:43:30 |
SomeoneSerge (matrix works sometimes) | Ehhh tfw /proc/sys/fs/file-max is 20 characters long but nix build fails with "too many open files" | 16:17:07 |
SomeoneSerge (matrix works sometimes) | $ sudo lsof | wc -l
1191414
THat's not much is it | 16:21:12 |
SomeoneSerge (matrix works sometimes) | In reply to @gjvnq:matrix.org
Yeah, I had already figured out it but the bug issue is that I don't know what is the "right" way to include the definition of the half type.
To make matters worse, I've tried to compile AliceVision on a docker container and using the official compilation scripts and yet the thing keeps failing. This means I can't even look at how the thing is supposed to compile.
I'm at a bit of a loss as for how to proceed, but I suspect that I'll have to either ask the original authors for help or carefully read the cmake compilation scripts in order to look for potential sources of the error.
Theoretically AliceVision has a nice CI pipeline but I can't see their build history so I don't even know how useful their CI scripts are. Could it be this https://github.com/AcademySoftwareFoundation/Imath/blob/2fc9d89ec52003350fcfd20f337bb3d0b870ff5a/src/Imath/half.h#L180-L182 | 16:25:14 |
Mir | In reply to @ss:someonex.net Could it be this https://github.com/AcademySoftwareFoundation/Imath/blob/2fc9d89ec52003350fcfd20f337bb3d0b870ff5a/src/Imath/half.h#L180-L182 possibly, but I'm afraid of just patching source code to force the inclusion of CUDA's half without first exhausting config flags. I feel like something in CMake is misconfigured or bugged and I feel like I should patch CMakeFile.txt before patching the source code directly | 16:32:00 |
aidalgol | In reply to @ss:someonex.net
I suppose we need some kind of a fixpoint 🤷
Btw, I just had a look at the mangohud derivation, and
- we're still doing
inherit (linuxPackages.nvidia_x11) libXNVCtrl which I think is a mauvais ton (referencing a concrete version of linuxPackages from nixpkgs)
- I still think it belongs in the top-level, maybe as an attrset,
libXNVCtrlVersions
- probably a very bad idea, but the nvidia nixos module could add an overlay setting the respective default version of libXNVCtrl
- all packages taking
libXNVCtrl from the top-level, we'd ensure only one version is in use in any given closure (probably at the cost of rebuilding reverse dependencies)
- passing a python package (mako) rather than python3Packages
Honestly, not sure if this is worth the effort 🙃 I did have an attempt at moving it to the top-level, but decided to do it in a separate PR. I need to come back to that. I'm not entirely sure how best to approach that so that nvidia_x11 can override it. | 19:59:27 |
aidalgol | I'm not convinced that having a single fixed version of libXNVCtrl is worth the trouble, but I do want to move it to the top level. | 20:00:03 |
| 15 Jun 2024 |
| shekhinah set their display name to shekhinah. | 08:46:32 |
matthewcroughan | Did you guys know that python311Packages.tensorrt is broken because someone updated pkgs/development/cuda-modules/tensorrt/releases.nix without checking that it broke any derivations? | 15:14:46 |
matthewcroughan | [astraluser@edward:~/Downloads/f/TensorRT-8.6.1.6]$ ls python/tensorrt-8.6.1-cp3
tensorrt-8.6.1-cp310-none-linux_x86_64.whl tensorrt-8.6.1-cp36-none-linux_x86_64.whl tensorrt-8.6.1-cp38-none-linux_x86_64.whl
tensorrt-8.6.1-cp311-none-linux_x86_64.whl tensorrt-8.6.1-cp37-none-linux_x86_64.whl tensorrt-8.6.1-cp39-none-linux_x86_64.whl
| 15:15:11 |
matthewcroughan | python3.11-tensorrt> /nix/store/d3dzfy4amjl826fb8j00qp1d9887h7hm-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context
error: builder for '/nix/store/8pw2fjq86vbkdd6s1bl6axfkhbnm18lr-python3.11-tensorrt-8.6.1.6.drv' failed with exit code 2;
last 10 log lines:
> Using pythonImportsCheckPhase
> Sourcing python-namespaces-hook
> Sourcing python-catch-conflicts-hook.sh
> Sourcing auto-add-driver-runpath-hook
> Using autoAddDriverRunpath
> Sourcing fix-elf-files.sh
> Running phase: unpackPhase
> tar: TensorRT-8.6.1.6/python/tensorrt-8.6.1.6-cp311-none-linux_x86_64.whl: Not found in archive
> tar: Exiting with failure status due to previous errors
> /nix/store/d3dzfy4amjl826fb8j00qp1d9887h7hm-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context
For full logs, run 'nix log /nix/store/8pw2fjq86vbkdd6s1bl6axfkhbnm18lr-python3.11-tensorrt-8.6.1.6.drv'.
| 15:15:34 |
matthewcroughan | they removed the .6 from the release | 15:15:47 |
matthewcroughan | TensorRT-8.6.1.6/python/tensorrt-8.6.1.6-cp311-none-linux_x86_64.whl is wrong
TensorRT-8.6.1.6/python/tensorrt-8.6.1-cp311-none-linux_x86_64.whl is correct | 15:16:11 |
SomeoneSerge (matrix works sometimes) | In reply to @matthewcroughan:defenestrate.it Did you guys know that python311Packages.tensorrt is broken because someone updated pkgs/development/cuda-modules/tensorrt/releases.nix without checking that it broke any derivations? Nvidia prevents unattended downloads, of course it broke | 16:08:17 |
matthewcroughan | God we need archive-org-pkgs | 16:22:55 |
teto | In reply to @connorbaker:matrix.org What revision of nixpkgs are you on? master fails to build (go-stable-diffusion errors during CMake configure) right sry I had disabled diffusion in an overlay. I've checked that it works on master now (following the ;local-ai 2.16 bump today). I've opened https://github.com/NixOS/nixpkgs/issues/320145 to help myself collect the info | 22:26:22 |
| 17 Jun 2024 |
| grw00 joined the room. | 12:25:16 |
grw00 | hey all, has anyone had success using cuda libraries inside a docker container built with nix? i don't mean running a cuda container on nixos host but the opposite, running a nix container containing cuda program on another host i build a container with nix and pytorch etc and run it on runpod, it doesnt see nvidia drivers/device though, i guess i am missing something. currently i have:
dockerImages.default = pkgs.dockerTools.streamLayeredImage {
name = "ghcr.io/my-image";
tag = "latest";
contents = [
pkgs.bash
pkgs.uutils-coreutils-noprefix
pkgs.cacert
pkgs.libnvidia-container
pythonEnv
];
config = {
Cmd = [ "${pkgs.bash}/bin/bash" ];
Env = [
"CUDA_PATH=${pkgs.cudatoolkit}"
"LD_LIBRARY_PATH=${pkgs.linuxPackages_5_4.nvidia_x11}/lib"
];
};
};
| 12:30:49 |
SomeoneSerge (matrix works sometimes) | grw00: are you using CDI or the runtime wrappers? Either way you need to have the drivers exposed in ld_library_path or mounted under /run/opengl-driver/lib | 12:31:13 |
grw00 | not sure what CDI is, i understand i need the /run/opengl-driver but i'm not sure how to achieve that in docker container | 12:32:09 |
SomeoneSerge (matrix works sometimes) | In reply to @grw00:matrix.org
hey all, has anyone had success using cuda libraries inside a docker container built with nix? i don't mean running a cuda container on nixos host but the opposite, running a nix container containing cuda program on another host i build a container with nix and pytorch etc and run it on runpod, it doesnt see nvidia drivers/device though, i guess i am missing something. currently i have:
dockerImages.default = pkgs.dockerTools.streamLayeredImage {
name = "ghcr.io/my-image";
tag = "latest";
contents = [
pkgs.bash
pkgs.uutils-coreutils-noprefix
pkgs.cacert
pkgs.libnvidia-container
pythonEnv
];
config = {
Cmd = [ "${pkgs.bash}/bin/bash" ];
Env = [
"CUDA_PATH=${pkgs.cudatoolkit}"
"LD_LIBRARY_PATH=${pkgs.linuxPackages_5_4.nvidia_x11}/lib"
];
};
};
Hard coding linuxPackages in the image is a bad idea. With cuda you normally don't want drivers in the image, you want the host's drivers mounted in the containet | 12:33:17 |
SomeoneSerge (matrix works sometimes) | No need for libnvidia-container in the imahe either i think | 12:34:17 |
grw00 | In reply to @ss:someonex.net Hard coding linuxPackages in the image is a bad idea. With cuda you normally don't want drivers in the image, you want the host's drivers mounted in the containet ah kk, got it. i'm specifically trying to use this on runpod.io, i don't think they offer this as a possibility. it seems like the images they offer all have cuda installed in image | 12:35:10 |