17 Oct 2024 |
hexa (UTC+1) | builds fine on my 8th gen intel machine | 00:44:03 |
hexa (UTC+1) | reliably crashes and gets stuck on 6th gen intel | 00:44:36 |
SomeoneSerge (utc+3) | In reply to @hexa:lossy.network doesn't use xdist Oh. So "pt_main_thread" must be the only thread it's mentioned there just because I guess | 00:45:24 |
hexa (UTC+1) | both are effectively skylake machines (x86_64 v3) | 00:46:18 |
hexa (UTC+1) | and since they go [defunct] and not crash I also get no coredump | 00:46:50 |
connor (he/him) (UTC-7) | Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38
Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38 | 03:55:43 |
SomeoneSerge (utc+3) | In reply to @connorbaker:matrix.org
Pretty proud of how performant this is, after much stats scrutinizing and reading primop implementations: https://github.com/ConnorBaker/cuda-packages/blob/23f199365343e3355469332acb5cf501c8c5fc68/upstreamable-lib/attrsets.nix#L38
Credit for the flattenDrvTree function goes to Adam Joseph though, basically took that stuff from https://github.com/NixOS/nixpkgs/blob/3a5940b539fdd56ace90d5e79a926e5e2694ba45/pkgs/top-level/release-attrpaths-superset.nix#L38 That for a release file or? | 08:56:25 |
connor (he/him) (UTC-7) | Yeah, the idea being that if you’re using CI that isn’t hydra or wanted to build a whole package set, you could use that function | 14:22:16 |
18 Oct 2024 |
mcwitt | Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32
I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.
Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e
| 18:45:05 |
mcwitt | * Hi guys, I'm running into an error with jaxlibWithCuda in master: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.down.i32
I'm not certain whether the issue is with the nixpkgs infrastructure or upstream, but thought I'd raise it here first in case anyone has seen something similar.
Full reproduction with flake.lock here: https://gist.github.com/mcwitt/4cf6c5cae44152dab1df1ef96d49d22e
EDIT: one reason I suspect it might be a nixpkgs-specific issue is I'm also seeing many instances of the warning '+ptx84' is not a recognized feature for this target (ignoring feature) , which was reported to be fixed in 0.4.28: https://github.com/jax-ml/jax/issues/21121#issuecomment-2103606397
| 18:47:10 |
Gaétan Lepage | Idk if an update would help. Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project... | 20:53:57 |
mcwitt |
Unfortunately, they have heavily changed their packaging internals and we are quite stalling on this project...
Ah man, sorry to hear. I've been occasionally checking in on the progress of JAX updates (most recently https://github.com/NixOS/nixpkgs/pull/318995), and have to say that it seems like a herculean effort to debug all the build issues that come up with each update, not to mention the frequent breakage of downstream dependencies since things are moving so quickly with JAX. Thanks so much for you work on this! I've benefited a ton from having JAX in nixpkgs.
| 22:21:30 |
Gaétan Lepage | Glad that it's useful for you ! Sometimes I wonder if there are people using the nix python set ^^.
Indeed, each update involves some work. While most of the time it is pretty straightforward, the recent changes (related to bazel) are much more annoying. | 22:31:53 |
Gaétan Lepage | hexa (UTC+1) I am indeed unable to build tensordict from my torch 2.5.0 update PR.
I systematically gets stuck at 22%, no matter the core count apparently.
I will try to investigate this tomorrow. | 22:41:37 |
hexa (UTC+1) | Thank you. | 22:42:09 |
19 Oct 2024 |
Gaétan Lepage | Very weird, I was now able to build it.
It does this in the middle, but the tests are apparently all succesfull. | 11:35:50 |
Gaétan Lepage | Download clipboard.png | 11:35:54 |
Gaétan Lepage | Sometimes this message doesn't show up and the package builds just fine... | 11:47:44 |
hexa (UTC+1) | as I said, 6th gen intel breaks, 8th gen intel works | 12:53:32 |
hexa (UTC+1) | both are essentially skylake | 12:53:40 |
hexa (UTC+1) | my build farm is 6th gen fwiw | 12:53:45 |
hacker1024 | In reply to @hexa:lossy.network reliably crashes and gets stuck on 6th gen intel Could it be out of date microcode? | 12:55:56 |
hexa (UTC+1) | 🤔 | 12:56:10 |
hexa (UTC+1) | hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
| 12:56:44 |
hexa (UTC+1) | which defaults on config.hardware.enableAllFirmware | 12:57:09 |
hexa (UTC+1) | which defaults to false | 12:57:19 |
hexa (UTC+1) | 🤡 | 12:57:20 |
hacker1024 | Rip | 12:57:24 |
hexa (UTC+1) | thank you nixos-generate-config | 12:57:34 |
hacker1024 | No guarantee that it'll fix it but it can't hurt | 12:57:42 |