!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
7 Feb 2025
@stick:matrix.orgsticki rebased the PR to check whether the CI fails again on the same test15:09:55
@stick:matrix.orgstick* i rebased the PR https://github.com/NixOS/nixpkgs/pull/379575 to check whether the CI fails again on the same test15:10:13
@stick:matrix.orgstickupdate: no it did not - i guess there was an error in master, not in my branch15:11:22
@ss:someonex.netSomeoneSerge (back on matrix) Yes ofc. I was about to press the button but then this weird action failed even after I restarted it manually 15:23:31
@stick:matrix.orgstickis that the only thing needed to get vllm into nix-community cache?15:23:58
@ss:someonex.netSomeoneSerge (back on matrix)Looks like it's happy after the rebase?15:24:01
@stick:matrix.orgstickyes, it is15:24:09
@stick:matrix.orgstickthanks for the merge!15:33:11
8 Feb 2025
@terrorjack:matrix.orgterrorjack joined the room.01:25:30
@terrorjack:matrix.orgterrorjack set a profile picture.02:24:20
@terrorjack:matrix.orgterrorjack removed their profile picture.02:24:57
@vanishingideal:matrix.orgvanishingideal joined the room.04:06:53
@mabau:matrix.org@mabau:matrix.org left the room.07:11:39
@zopieux:matrix.zopi.euzopieux

alright, we got a recent build, so I tried again. Updated to 550e11f and ran:

$ colmena build -v --show-trace --nix-option builders "" --nix-option cores 0
x | these 11 derivations will be built:
x |   /nix/store/f1s6y83hb8gdl0s49vmj0w54i5a75gd7-ollama-0.5.7.drv
x |   /nix/store/fpfv7cn50ns667qrkwx2frn26di1hnc7-ollama.service.drv
[snip]
x | building '/nix/store/f1s6y83hb8gdl0s49vmj0w54i5a75gd7-ollama-0.5.7.drv'...

even though f1s6… is right there. Is there a way to debug that nix-community is even being requested at all, perhaps?

15:36:58
@zopieux:matrix.zopi.euzopieux ok that was dumb, I should have checked nix flake show first. Something was overriding the substituters this whole time, despite my flake setting them. Sorry for the noise, it's all fine now! 16:02:13
@zopieux:matrix.zopi.euzopieux * ok that was dumb, I should have checked nix config show first. Something was overriding the substituters this whole time, despite my flake setting them. Sorry for the noise, it's all fine now! 16:02:25
@stick:matrix.orgstick

Do we want to enable aarch64-linux for cuda package set on hydra?

I think it makes sense with already available GH200 and Digits available in May

20:25:34
@stick:matrix.orgstickin the meantime I am compiling the aarch64-linux package set on my macbook, trying to remove potential build failures21:21:27
@adam:robins.wtf@adam:robins.wtf left the room.21:22:28
9 Feb 2025
@ss:someonex.netSomeoneSerge (back on matrix) Yes. The only "blocker" is refactoring release-cuda.nix to support specifying jetson platforms instead of sbsa 04:21:13
@ss:someonex.netSomeoneSerge (back on matrix) * Yes. The only "blocker" is refactoring release-cuda.nix to support specifying jetson platforms instead of (or in addition to? does anyone use) sbsa 04:21:29
@stick:matrix.orgstickYes, GH200 is SBSA iirc19:44:53
@ruroruro:matrix.orgruro

I am currently working on fixing cuda-samples. I wanted to bikeshed a bit more about its dependency on the insecure freeimage package. I still think that in general having eval errors for insecure package dependencies is useful. Afaik, the official hydra builders don't build insecure packages, so I am kind of against setting allowInsecure/allowInsecurePredicate/permittedInsecurePackages in release-cuda.nix.

On the other hand, it's kind of silly to treat cuda-samples as insecure just because it depends on freeimage. Most of the code in cuda-samples is for demonstration/validation purposes only, a lot of it doesn't even have proper error checking and shouldn't be used in production anyway. So it is extremely weird to refuse building cuda-samples just because there are some potential buffer overflow exploits or whatever in the library it uses for image reading.

I still think that we should just change freeimage to

freeimage.overrideAttrs (prev: {
  meta = prev.meta // {
    knownVulnerabilities = [ ];
  };
})

specifically only in the cuda-samples derivation.

However, if not shipping "vulnerable" binaries is so important, then we could keep cuda-samples broken, but introduce a separate cuda-samples.passthru.buildCheckOnly derivation that ignores the freeimage "vulnerabilities", but then replaces its installPhase with something along the lines of touch $out. That way, we keep cuda-samples.buildCheckOnly as a useful smoke test, but don't distribute any "potentially vulnerable" binaries. Though, TBH, this sounds like over-engineering and we should just override freeimage.knownVulnerabilities in cuda-samples specifically.

Any thoughts?

20:24:30
@stick:matrix.orgstickwhat exactly is the purpose to have samples packaged in nixpkgs? my first intuition tells me this does not belong to nixpkgs at all22:13:09
@stick:matrix.orgstickmaybe just like a sanity check whether things compile22:17:59
@ruroruro:matrix.orgruroAlthough they are called "samples", in reality I'd say that some of them are closer to debugging utilities, tests or benchmarks. And by "tests" I mean both for verifying that they successfully compile (although currently not all of them do) and for running them on the target machine to verify that the nvidia card works as expected.22:18:12
@ruroruro:matrix.orgruro A lot of these binaries run some computation on the GPU, verify the results, print some information and then report Result = PASS or something. Unfortunately, the output format of these binaries differs too much and running them requires access to the GPU, so it's basically impossible to create a "proper" nixpkgs test from them. But checking that they compile successfully and providing the resulting binaries to the users is still useful. 22:21:34
@ruroruro:matrix.orgruro * A lot of these binaries run some computation on the GPU, verify the results, print some information and then report Result = PASS or something. Unfortunately, the output format of these binaries differs too much and running them requires access to the GPU, so it's basically impossible to create a "proper" nixpkgs test from them. But checking that they compile successfully and providing the resulting binaries to the users is still useful (IMHO). 22:22:10
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Does a patched version of freeimage exist, or is it abandoned? If there is a newer version, I’d be interested in whether, as a longer term goal, it’d be possible to update the samples to use it.23:21:11
10 Feb 2025
@stick:matrix.orgstickRedacted or Malformed Event00:11:29

Show newer messages


Back to Room ListRoom Version: 9