!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

211 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda42 Servers

Load older messages


SenderMessageTime
17 Oct 2024
@hexa:lossy.networkhexa (UTC+1)builds fine on my 8th gen intel machine00:44:03
@hexa:lossy.networkhexa (UTC+1)reliably crashes and gets stuck on 6th gen intel00:44:36
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @hexa:lossy.network
doesn't use xdist
Oh. So "pt_main_thread" must be the only thread it's mentioned there just because I guess
00:45:24
@hexa:lossy.networkhexa (UTC+1)both are effectively skylake machines (x86_64 v3)00:46:18
@hexa:lossy.networkhexa (UTC+1)and since they go [defunct] and not crash I also get no coredump00:46:50
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38

Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38

03:55:43
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @connorbaker:matrix.org

Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38

Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38

That for a release file or?
08:56:25
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Yeah, the idea being that if you’re using CI that isn’t hydra or wanted to build a whole package set, you could use that function14:22:16
18 Oct 2024
@mcwitt:matrix.orgmcwitt

Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32

I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.

Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e

18:45:05
@mcwitt:matrix.orgmcwitt *

Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32

I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.

Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e

EDIT: one reason I suspect it might be a nixpkgs-specific issue is I'm also seeing many instances of the warning '+ptx84' is not a recognized feature for this target (ignoring feature), which was reported to be fixed in 0.4.28: https://github.com/jax-ml/jax/issues/21121#issuecomment-2103606397

18:47:10
@glepage:matrix.orgGaétan LepageIdk if an update would help. Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...20:53:57
@mcwitt:matrix.orgmcwitt

Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...

Ah man, sorry to hear. I've been occasionally checking in on the progress of JAX updates (most recently https://github.com/NixOS/nixpkgs/pull/318995), and have to say that it seems like a herculean effort to debug all the build issues that come up with each update, not to mention the frequent breakage of downstream dependencies since things are moving so quickly with JAX. Thanks so much for you work on this! I've benefited a ton from having JAX in nixpkgs.

22:21:30
@glepage:matrix.orgGaétan Lepage Glad that it's useful for you ! Sometimes I wonder if there are people using the nix python set ^^.
Indeed, each update involves some work. While most of the time it is pretty straightforward, the recent changes (related to bazel) are much more annoying.
22:31:53
@glepage:matrix.orgGaétan Lepage hexa (UTC+1) I am indeed unable to build tensordict from my torch 2.5.0 update PR.
I systematically gets stuck at 22%, no matter the core count apparently.
I will try to investigate this tomorrow.
22:41:37
@hexa:lossy.networkhexa (UTC+1)Thank you.22:42:09
19 Oct 2024
@glepage:matrix.orgGaétan Lepage Very weird, I was now able to build it.
It does this in the middle, but the tests are apparently all succesfull.
11:35:50
@glepage:matrix.orgGaétan Lepageclipboard.png
Download clipboard.png
11:35:54
@glepage:matrix.orgGaétan LepageSometimes this message doesn't show up and the package builds just fine...11:47:44
@hexa:lossy.networkhexa (UTC+1)as I said, 6th gen intel breaks, 8th gen intel works12:53:32
@hexa:lossy.networkhexa (UTC+1)both are essentially skylake12:53:40
@hexa:lossy.networkhexa (UTC+1)my build farm is 6th gen fwiw12:53:45
@hacker1024:matrix.orghacker1024
In reply to @hexa:lossy.network
reliably crashes and gets stuck on 6th gen intel
Could it be out of date microcode?
12:55:56
@hexa:lossy.networkhexa (UTC+1)🤔12:56:10
@hexa:lossy.networkhexa (UTC+1)
hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
12:56:44
@hexa:lossy.networkhexa (UTC+1) which defaults on config.hardware.enableAllFirmware 12:57:09
@hexa:lossy.networkhexa (UTC+1) which defaults to false 12:57:19
@hexa:lossy.networkhexa (UTC+1)🤡12:57:20
@hacker1024:matrix.orghacker1024Rip12:57:24
@hexa:lossy.networkhexa (UTC+1) thank you nixos-generate-config 12:57:34
@hacker1024:matrix.orghacker1024No guarantee that it'll fix it but it can't hurt12:57:42

Show newer messages


Back to Room ListRoom Version: 9