!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

284 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
16 Mar 2023
@ss:someonex.netSomeoneSerge (back on matrix)

We should now be able to run nixpkgs-review like so: nixpkgs-review pr XXXXXX --extra-nixpkgs-config '{ cudaCapabilities = [ "8.6" ]; }'. This should save us some compute. There's cache for 8.6 on cuda-maintainers cachix for the master branch

https://github.com/Mic92/nixpkgs-review/pull/315

18:12:15
@skainswo:matrix.orgSamuel Ainsworth Someone S: is there a way to see a history of your CI runs that get pushed to cuda-maintainers? i'm curious to see what things are building and what aren't atm 23:15:25
@skainswo:matrix.orgSamuel Ainsworthi seem to be getting a bunch of cache misses in my CI that i would not expect23:15:30
@ss:someonex.netSomeoneSerge (back on matrix)

Samuel Ainsworth: oh, this is almost certainly happening because I decreased the frequency for the default set of cudaCapabilities

what things are building and what aren't atm

You can see which jobs sets are enabled and what schedule they run on in any "config" job in hercules, e.g.: https://hercules-ci.com/github/SomeoneSerge/nixpkgs-unfree/jobs/3513
The jobs are declared in flake.nix, e.g. buildMaster86Essential which, according to hercules, runs "00:19, 02:19, 20:19, 22:19 UTC every day", builds for cudaCapabilities = [ "8.6" ];.

23:26:17
@ss:someonex.netSomeoneSerge (back on matrix)

because I decreased the frequency

Sorry I did that without proper announcement

23:26:39
@ss:someonex.netSomeoneSerge (back on matrix)This reminds me, I wanted to open https://github.com/NixOS/nixpkgs/issues/22156423:28:40
17 Mar 2023
@atrius:matrix.orgmjlbach joined the room.04:31:41
@atrius:matrix.orgmjlbachAre people mostly deploying torch/cuda builds via nix-shells nowadays? I've had bad luck getting poetry2nix/dream2nix working for cuda enabled torch builds and was curious what the workflow is with the cuda cachix now04:44:09
@ss:someonex.netSomeoneSerge (back on matrix)I can only speak for myself, I use source builds from nixpkgs with cudaSupport=true. What kind of issues are we talking about?11:04:56
@domenkozar:matrix.orgDomen Kožarwe also have cuda support for devenv, but it's not finalized yet: https://github.com/cachix/devenv/pull/42212:24:56
@tpw_rules:matrix.orgtpw_ruleshow does devenv work to access the host graphics drivers?13:33:11
@connorbaker:matrix.orgconnor (he/him) I ended up using a shellHook which aliases python and other common commands so they're wrapped with nixGL: https://github.com/ConnorBaker/mfsr_utils/blob/main/flake.nix#L82-L86 13:44:55
@connorbaker:matrix.orgconnor (he/him)I have some time today so hopefully I can get caught up on all the stuff merged/reported over the last two weeks ✨13:47:29
@connorbaker:matrix.orgconnor (he/him)Per the conversation here https://github.com/NixOS/nixpkgs/pull/220366#discussion_r1135048161, what needs to be done to tell the community that we're dropping support for CUDA 10? Does it make sense to stop there, or should we drop up until 11.4? In terms of the GPUs we support, the only thing we'd remove support for is the early-generation Kepler GPUs: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/compilers/cudatoolkit/gpus.nix#L18-L31, since everything else is supported through at least 11.8.14:36:36
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org
Per the conversation here https://github.com/NixOS/nixpkgs/pull/220366#discussion_r1135048161, what needs to be done to tell the community that we're dropping support for CUDA 10? Does it make sense to stop there, or should we drop up until 11.4? In terms of the GPUs we support, the only thing we'd remove support for is the early-generation Kepler GPUs: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/compilers/cudatoolkit/gpus.nix#L18-L31, since everything else is supported through at least 11.8.
I don't think we should remove CUDA 10 as such, until it's officially EOL (is it?). We should just communicate that when we package e.g. pytorch in nixpkgs we'll ensure that it works with cudaPackages_11_4 forward, but if one wants to build against cudaPackages_10, they'll just have to maintain their own copy of the torch expression
14:43:53
@ss:someonex.netSomeoneSerge (back on matrix) I think this amounts to a message in the release notes: non-legacy packages can be overridden and passed a different cudaPackages argument, as long as it's redist 14:45:11
@connorbaker:matrix.orgconnor (he/him)Hm, I realize I don't actually know what kind of support NVIDIA provides for different devices. I know on Windows they'll remove support for devices by model, but with their Linux drivers it's usually per-architecture. I did find https://docs.nvidia.com/datacenter/tesla/drivers/#software-matrix, but I don't know if they define EOL/unsupported/deprecated For example, here's them announcing the end of support for Quadro Kepler devices: https://nvidia.custhelp.com/app/answers/detail/a_id/521014:56:13
@connorbaker:matrix.orgconnor (he/him)Maybe this is made more difficult by the fact that there's driver support and then there's software library (CUDA) support :l I guess it'd be fair to say that if they're no longer publishing drivers for a GPU that it has been EOL'd?14:58:03
@ss:someonex.netSomeoneSerge (back on matrix) Lol. Maybe when the last package in nixpkgs drops the cuda < 11 requirement 15:08:38
@ss:someonex.netSomeoneSerge (back on matrix)...which means what, caffe?15:08:54
@atrius:matrix.orgmjlbachI can send a couple repros in the channel!15:50:33
@atrius:matrix.orgmjlbachFor the nix-shell case, here is a minimal example:15:50:53
@atrius:matrix.orgmjlbach
{
  description = "A very basic flake";

  outputs = { self, nixpkgs }: 
  let
    system = "x86_64-linux";
    pkgs = import nixpkgs {
      inherit system;
      config = { 
        allowUnfree = true;
        cudaEnabled = true;
      };
    };
  in
  {
    devShell.${system} = import ./shell.nix { inherit pkgs; };
  };
}

15:51:56
@atrius:matrix.orgmjlbach
{ pkgs }:
let 
my-python-packages = p: with p; [
  (pytorch.override { cudaSupport = true; })
  # other python packages
];
in
pkgs.mkShell {
  buildInputs = with pkgs; [
    (python3.withPackages my-python-packages)
  ];

  shellHook = ''
  '';
}
15:52:14
@atrius:matrix.orgmjlbachThis doesn't seem to hit the binary cache?15:52:18
@atrius:matrix.orgmjlbachThanks Domen! I'm going to try out the cuda support with the poetry example https://github.com/cachix/devenv/tree/main/examples/python-poetry once it's merged15:57:37
@atrius:matrix.orgmjlbachThanks! This is how I used to do things, but one issue is that it's locked to whatever the latest is in nixpkgs. Btw, how do you avoid cache misses? Is there a list of what cuda-maintainers is providing CI for?16:05:13
@atrius:matrix.orgmjlbachAhh I see why you had to do the nvidia driver pinning, thats a bit unfortunate16:26:21
@atrius:matrix.orgmjlbachIs there a reason you didn't opt for setting the LD_LIBRARY_PATH directly?16:34:41
@atrius:matrix.orgmjlbach
{
  description = "A very basic for pytorch support";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
  };

  nixConfig = {
    # Add the CUDA maintainer's cache
    extra-substituters = [
      "https://nix-community.cachix.org"
      "https://cuda-maintainers.cachix.org"
    ];
    extra-trusted-public-keys = [
      "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
      "cuda-maintainers.cachix.org-1:0dq3bujKpuEPMCX6U4WylrUDZ9JyUG0VpVZa7CNfq5E="
    ];
  };

  outputs = { self, nixpkgs, nixgl }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
        config = {
          allowUnfree = true;
          cudaEnabled = true;
          cudaCapabilities = [ "8.6" ];
          cudaForwardCompat = true;
        };
      };
      my-python-packages = p: with p; [
        (pytorch.override { cudaSupport = true; })
      ];
    in
    {
      devShell.${system} = pkgs.mkShell {
        packages = with pkgs; [
          (python310.withPackages my-python-packages)
        ];

       shellHook = ''
          export CUDA_PATH=${pkgs.cudatoolkit}
          export LD_LIBRARY_PATH=${pkgs.linuxPackages.nvidia_x11}/lib
          export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
          export EXTRA_CCFLAGS="-I/usr/include"
       '';
      };
    };
}

16:34:45

There are no newer messages yet.


Back to Room ListRoom Version: 9