| 4 Jul 2024 |
Jonas Chevalier | In reply to @connorbaker:matrix.org Is there an open collective for the community hydra instance yes, we spend it all on hardware: https://opencollective.com/nix-community
we could also explore hardware donation if you want to bring esoteric hardware to the build farm.
| 08:06:20 |
Jonas Chevalier | In reply to @ss:someonex.net H'm so in hydra you "create a jobset" somewhere like in a web ui before you merge the terraform configs? Or the tf config is the whole thing but you deployed it manually? The jobset is created with Terraform with https://github.com/nix-community/infra/blob/master/terraform/hydra-projects.tf
This works well because Hydra is a mix of stateful stuff so having a convergence engine is quite nice there.
| 08:07:37 |
Jonas Chevalier | In reply to @ss:someonex.net Jonas Chevalier while at it, nobody is building import <nixpkgs> { config.rocmSupport = true; } either, and that one is free Ok, let's do that once CUDA is stable. Building unfreeRedistributable could also be nice. | 08:08:18 |
Jonas Chevalier | In reply to @ss:someonex.net Why not just https://github.com/NixOS/nixpkgs/pull/324379/files#diff-b3a88f86f137f8870849673fb9b06582cb73937114ee34a61ae5604e259829a5R37 I think this is going to break our instance. The main hydra needs 128GB of RAM to evaluate all of nixpkgs. If you want to keep the list up to date, it's probably better to invest in a script. | 08:11:25 |
Jonas Chevalier | In reply to @ss:someonex.net Why not just https://github.com/NixOS/nixpkgs/pull/324379/files#diff-b3a88f86f137f8870849673fb9b06582cb73937114ee34a61ae5604e259829a5R37 * I think this is going to break our instance. The main hydra needs 128GB of RAM to evaluate all of nixpkgs. If you want to keep the list up to date, it's probably better to invest in a script (that you run locally and commit the result). | 08:11:45 |
| Philip Taron (UTC-8) left the room. | 15:46:31 |
| 5 Jul 2024 |
Jonas Chevalier | I don't know if this has been discussed before: did you look at aligning the package versions with some upstream?
For example Nvidia are releasing the nvcr.io Docker images. If we could provide the same versions as a package sets, it would reduce the switching cost for those users. | 06:10:17 |
SomeoneSerge (back on matrix) | In reply to @zimbatm:numtide.com I don't know if this has been discussed before: did you look at aligning the package versions with some upstream?
For example Nvidia are releasing the nvcr.io Docker images. If we could provide the same versions as a package sets, it would reduce the switching cost for those users. Well if we're talking about cudaPackages, they are aligned with the manifests that upstream advertises | 07:20:34 |
SomeoneSerge (back on matrix) |
it would reduce the switching cost for those users.
Do you have a specific example in mind? | 07:20:53 |
SomeoneSerge (back on matrix) | *
it would reduce the switching cost for those users.
Do you have a specific user story in mind? | 07:21:02 |
Jonas Chevalier | A very specific example is: one customer is using nvcr.io/nvidia/pytorch:23.08-py3 ( CUDA 12.21, cuDNN 8.9.4, Python 3.10, PyTorch 2.1.0 ) and looking to try out Nix to fix their reproducibility issues | 07:29:17 |
SomeoneSerge (back on matrix) | And in this case you'd suggest we provide cudaPackages'.cuda_12_21_cudnn_8_9_4? | 07:44:11 |
SomeoneSerge (back on matrix) | ...instead of referring to the manual and cudaPackages.overrideScope' (...)? | 07:44:49 |
SomeoneSerge (back on matrix) | * ...instead of referring to the manual and cudaPackages.overrideScope' (...) | 07:44:51 |
Jonas Chevalier | I haven't thought about this deeply. One potentiality is to maintain a packageset like cudaPackages.pytorch_23_08 | 07:57:44 |
SomeoneSerge (back on matrix) | I think an out-of-tree collection of buildLayeredImage expressions reproducing nvcr images would make sense | 08:08:38 |
SomeoneSerge (back on matrix) | In-tree, maybe not so much because these sound like finalized compositions of packages | 08:09:10 |
hexa | and also Python 3.10 might be hit or miss these days | 10:52:20 |
SomeoneSerge (back on matrix) | In reply to @hexa:lossy.network and also Python 3.10 might be hit or miss these days I thought we can ignore this without a disclaimer xD | 12:14:24 |
| Sami Liedes joined the room. | 23:03:28 |
| 6 Jul 2024 |
SomeoneSerge (back on matrix) | In reply to @hexa:lossy.network faissWithCuda pls 😄 Oh, broken on x86 64 https://hydra.nix-community.org/build/68172 | 16:53:02 |
hexa | Looks like oom to me | 16:54:55 |
hexa | sigkill | 16:55:19 |
hexa | maybe limit number of parallel ptxas instances? | 16:56:48 |
hexa | Or maybe just an outlier | 16:57:10 |
SomeoneSerge (back on matrix) | Wondering what is the generic way to do that | 17:43:02 |
SomeoneSerge (back on matrix) | from nvcc docs | 17:43:06 |
SomeoneSerge (back on matrix) |  Download clipboard.png | 17:43:13 |
hexa | https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#nvcc-environment-variables | 18:09:41 |
hexa | can also be passed via an env var | 18:09:50 |