!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
16 Oct 2024
@hexa:lossy.networkhexa (UTC+1)I'm heading out for dinner, maybe it will complete in the next 2h 17:31:37
@hexa:lossy.networkhexa (UTC+1)

┃ ⏵ python3.12-tensordict-0.5.0 (pytestCheckPhase) ⏱ 2h35m4s

19:52:24
@hexa:lossy.networkhexa (UTC+1)hasn't moved one bit since19:52:27
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @hexa:lossy.network
:: (nixbld10) → /nix/store/svw8b4655f6w413xz23jjg6yn4b1d9p0-python3.12-tensordict-0.5.0
  UID     PID    PPID STIME     TIME COMMAND
30010  432250  432213 17:17 00:00:00 bash -e /nix/store/v6x3cs394jgqfbi0a42pam708flxaphh-default-builder.sh
30010  433687  432250 17:17 00:02:07 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010  434047  433687 17:18 00:00:03 [pt_main_thread] <defunct>
30010  464302  433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010  464342  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464382  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464422  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464462  433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010  464502  433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010  464542  433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010  464582  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464622  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464662  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464702  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464742  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464766  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464822  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464862  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464903  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464945  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  464986  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465029  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465071  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465076  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465157  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465192  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465244  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465286  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465328  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465370  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465413  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465455  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465499  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465541  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465583  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465614  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465669  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465712  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465755  433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010  465797  433687 17:20 00:00:00 [pt_main_thread] <defunct>
Shall we make it pytest -n1
20:06:34
@hexa:lossy.networkhexa (UTC+1)not sure yet 🙂 20:29:41
@hexa:lossy.networkhexa (UTC+1)feel free to try20:29:46
17 Oct 2024
@hexa:lossy.networkhexa (UTC+1)
In reply to @ss:someonex.net
Shall we make it pytest -n1
doesn't use xdist
00:43:31
@hexa:lossy.networkhexa (UTC+1)builds fine on my 8th gen intel machine00:44:03
@hexa:lossy.networkhexa (UTC+1)reliably crashes and gets stuck on 6th gen intel00:44:36
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @hexa:lossy.network
doesn't use xdist
Oh. So "pt_main_thread" must be the only thread it's mentioned there just because I guess
00:45:24
@hexa:lossy.networkhexa (UTC+1)both are effectively skylake machines (x86_64 v3)00:46:18
@hexa:lossy.networkhexa (UTC+1)and since they go [defunct] and not crash I also get no coredump00:46:50
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38

Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38

03:55:43
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org

Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38

Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38

That for a release file or?
08:56:25
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Yeah, the idea being that if you’re using CI that isn’t hydra or wanted to build a whole package set, you could use that function14:22:16
18 Oct 2024
@mcwitt:matrix.orgmcwitt

Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32

I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.

Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e

18:45:05
@mcwitt:matrix.orgmcwitt *

Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32

I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.

Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e

EDIT: one reason I suspect it might be a nixpkgs-specific issue is I'm also seeing many instances of the warning '+ptx84' is not a recognized feature for this target (ignoring feature), which was reported to be fixed in 0.4.28: https://github.com/jax-ml/jax/issues/21121#issuecomment-2103606397

18:47:10
@glepage:matrix.orgGaétan LepageIdk if an update would help. Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...20:53:57
@mcwitt:matrix.orgmcwitt

Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...

Ah man, sorry to hear. I've been occasionally checking in on the progress of JAX updates (most recently https://github.com/NixOS/nixpkgs/pull/318995), and have to say that it seems like a herculean effort to debug all the build issues that come up with each update, not to mention the frequent breakage of downstream dependencies since things are moving so quickly with JAX. Thanks so much for you work on this! I've benefited a ton from having JAX in nixpkgs.

22:21:30
@glepage:matrix.orgGaétan Lepage Glad that it's useful for you ! Sometimes I wonder if there are people using the nix python set ^^.
Indeed, each update involves some work. While most of the time it is pretty straightforward, the recent changes (related to bazel) are much more annoying.
22:31:53
@glepage:matrix.orgGaétan Lepage hexa (UTC+1) I am indeed unable to build tensordict from my torch 2.5.0 update PR.
I systematically gets stuck at 22%, no matter the core count apparently.
I will try to investigate this tomorrow.
22:41:37
@hexa:lossy.networkhexa (UTC+1)Thank you.22:42:09
19 Oct 2024
@glepage:matrix.orgGaétan Lepage Very weird, I was now able to build it.
It does this in the middle, but the tests are apparently all succesfull.
11:35:50
@glepage:matrix.orgGaétan Lepageclipboard.png
Download clipboard.png
11:35:54
@glepage:matrix.orgGaétan LepageSometimes this message doesn't show up and the package builds just fine...11:47:44
@hexa:lossy.networkhexa (UTC+1)as I said, 6th gen intel breaks, 8th gen intel works12:53:32
@hexa:lossy.networkhexa (UTC+1)both are essentially skylake12:53:40
@hexa:lossy.networkhexa (UTC+1)my build farm is 6th gen fwiw12:53:45
@hacker1024:matrix.orghacker1024
In reply to @hexa:lossy.network
reliably crashes and gets stuck on 6th gen intel
Could it be out of date microcode?
12:55:56
@hexa:lossy.networkhexa (UTC+1)🤔12:56:10

Show newer messages


Back to Room ListRoom Version: 9