| 16 Oct 2024 |
hexa |
┃ ⏵ python3.12-tensordict-0.5.0 (pytestCheckPhase) ⏱ 2h35m4s
| 19:52:24 |
hexa | hasn't moved one bit since | 19:52:27 |
SomeoneSerge (back on matrix) | In reply to @hexa:lossy.network
:: (nixbld10) → /nix/store/svw8b4655f6w413xz23jjg6yn4b1d9p0-python3.12-tensordict-0.5.0
UID PID PPID STIME TIME COMMAND
30010 432250 432213 17:17 00:00:00 bash -e /nix/store/v6x3cs394jgqfbi0a42pam708flxaphh-default-builder.sh
30010 433687 432250 17:17 00:02:07 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010 434047 433687 17:18 00:00:03 [pt_main_thread] <defunct>
30010 464302 433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010 464342 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464382 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464422 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464462 433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010 464502 433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010 464542 433687 17:20 00:00:00 /nix/store/wfbjq35kxs6x83c3ncpfxdyl5gbhdx4h-python3-3.12.6/bin/python3.12 -m pytest -k not test_copy_onto and not test_mp and not test_functional and not test_linear and not test_seq and not test_seq_lmbda
30010 464582 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464622 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464662 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464702 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464742 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464766 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464822 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464862 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464903 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464945 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 464986 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465029 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465071 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465076 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465157 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465192 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465244 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465286 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465328 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465370 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465413 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465455 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465499 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465541 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465583 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465614 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465669 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465712 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465755 433687 17:20 00:00:00 [pt_main_thread] <defunct>
30010 465797 433687 17:20 00:00:00 [pt_main_thread] <defunct>
Shall we make it pytest -n1 | 20:06:34 |
hexa | not sure yet 🙂 | 20:29:41 |
hexa | feel free to try | 20:29:46 |
| 17 Oct 2024 |
hexa | In reply to @ss:someonex.net Shall we make it pytest -n1 doesn't use xdist | 00:43:31 |
hexa | builds fine on my 8th gen intel machine | 00:44:03 |
hexa | reliably crashes and gets stuck on 6th gen intel | 00:44:36 |
SomeoneSerge (back on matrix) | In reply to @hexa:lossy.network doesn't use xdist Oh. So "pt_main_thread" must be the only thread it's mentioned there just because I guess | 00:45:24 |
hexa | both are effectively skylake machines (x86_64 v3) | 00:46:18 |
hexa | and since they go [defunct] and not crash I also get no coredump | 00:46:50 |
connor (he/him) | Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38
Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38 | 03:55:43 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org
Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38
Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38 That for a release file or? | 08:56:25 |
connor (he/him) | Yeah, the idea being that if you’re using CI that isn’t hydra or wanted to build a whole package set, you could use that function | 14:22:16 |
| 18 Oct 2024 |
mcwitt | Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32
I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.
Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e
| 18:45:05 |
mcwitt | * Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32
I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.
Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e
EDIT: one reason I suspect it might be a nixpkgs-specific issue is I'm also seeing many instances of the warning '+ptx84' is not a recognized feature for this target (ignoring feature), which was reported to be fixed in 0.4.28: https://github.com/jax-ml/jax/issues/21121#issuecomment-2103606397
| 18:47:10 |
Gaétan Lepage | Idk if an update would help. Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project... | 20:53:57 |
mcwitt |
Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...
Ah man, sorry to hear. I've been occasionally checking in on the progress of JAX updates (most recently https://github.com/NixOS/nixpkgs/pull/318995), and have to say that it seems like a herculean effort to debug all the build issues that come up with each update, not to mention the frequent breakage of downstream dependencies since things are moving so quickly with JAX. Thanks so much for you work on this! I've benefited a ton from having JAX in nixpkgs.
| 22:21:30 |
Gaétan Lepage | Glad that it's useful for you ! Sometimes I wonder if there are people using the nix python set ^^.
Indeed, each update involves some work. While most of the time it is pretty straightforward, the recent changes (related to bazel) are much more annoying. | 22:31:53 |
Gaétan Lepage | hexa (UTC+1) I am indeed unable to build tensordict from my torch 2.5.0 update PR.
I systematically gets stuck at 22%, no matter the core count apparently.
I will try to investigate this tomorrow. | 22:41:37 |
hexa | Thank you. | 22:42:09 |
| 19 Oct 2024 |
Gaétan Lepage | Very weird, I was now able to build it.
It does this in the middle, but the tests are apparently all succesfull. | 11:35:50 |
Gaétan Lepage |  Download clipboard.png | 11:35:54 |
Gaétan Lepage | Sometimes this message doesn't show up and the package builds just fine... | 11:47:44 |
hexa | as I said, 6th gen intel breaks, 8th gen intel works | 12:53:32 |
hexa | both are essentially skylake | 12:53:40 |
hexa | my build farm is 6th gen fwiw | 12:53:45 |
hacker1024 | In reply to @hexa:lossy.network reliably crashes and gets stuck on 6th gen intel Could it be out of date microcode? | 12:55:56 |
hexa | 🤔 | 12:56:10 |
hexa | hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
| 12:56:44 |