!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
9 Jun 2024
@connorbaker:matrix.orgconnor (he/him)
In reply to @glepage:matrix.org
connor (he/him) (UTC-5), in case you have a bit of available CPU time, could you please run a nixpkgs-review pr --post-result 317576 ?
(If you don't to that's fine ofc)
Rerunning it by the way; got stuck for 20h+ on tensordict’s checkPhase :/
04:05:26
@glepage:matrix.orgGaétan Lepage
In reply to @connorbaker:matrix.org
Rerunning it by the way; got stuck for 20h+ on tensordict’s checkPhase :/
Thanks. The tensordict problematic problematic test has been disabled in https://github.com/NixOS/nixpkgs/pull/318111
07:47:15
@glepage:matrix.orgGaétan Lepage I now get stuck on python312Packages.botorch 07:47:27
@shekhinah:she.khinah.xyzshekhinah removed their display name she⯰khinah - traumaturgische Beratung.16:57:35
@connorbaker:matrix.orgconnor (he/him)I have a custom config I use specifically for nixpkgs-review you may like17:59:18
@connorbaker:matrix.orgconnor (he/him)https://gist.github.com/ConnorBaker/305b1aebd7ee74a258a616bbbd4dcd7b17:59:55
@shekhinah:she.khinah.xyzshekhinah changed their profile picture.18:04:55
@glepage:matrix.orgGaétan Lepage
In reply to @connorbaker:matrix.org
I have a custom config I use specifically for nixpkgs-review you may like
Wow
18:07:32
@glepage:matrix.orgGaétan LepageSo botorch did build for you ?18:07:41
@connorbaker:matrix.orgconnor (he/him)Yeah it did after I disabled checks for it18:15:10
@connorbaker:matrix.orgconnor (he/him) Just posted three variations of nixpkgs-review on the PR, https://github.com/NixOS/nixpkgs/pull/317576 18:15:45
@connorbaker:matrix.orgconnor (he/him)Looks good to me!18:15:49
@connorbaker:matrix.orgconnor (he/him)I'm going to try a run with nix-cuda-test real quick18:16:02
@glepage:matrix.orgGaétan LepageThank you so much !18:16:45
@glepage:matrix.orgGaétan Lepageyes, for me it hangs in the tests...18:16:54
@glepage:matrix.orgGaétan Lepage If this is not the case on master we should probably investigate that ?
Anyway, considering that a vast majority of the downstream packages do still build fine, I would argue for merging this PR.
18:17:49
@glepage:matrix.orgGaétan Lepage As a more general thought, I find very important to mark broken packages as such as it prevents us from diving in the nixpkgs-review failures every time to investigate whether a breakage is a regression or not. 18:18:53
@connorbaker:matrix.orgconnor (he/him)Agreed; I can't do it fast enough which is why I've just got that config I use18:25:26
@connorbaker:matrix.orgconnor (he/him) If I succeed in running nix-cuda-test, are you okay with me merging it? 18:30:56
@glepage:matrix.orgGaétan Lepage
In reply to @connorbaker:matrix.org
If I succeed in running nix-cuda-test, are you okay with me merging it?
Yes, I am OK with it.
19:22:14
@glepage:matrix.orgGaétan LepageI will go once again through the failures while this completes19:22:27
@hexa:lossy.networkhexa
In reply to @ss:someonex.net
Hmm there's a PR starting with python3Packages.torch: but Ofborg didn't try building it, just shows 20 green (eval&c) checks
https://github.com/NixOS/ofborg/issues/577
19:30:37
@hexa:lossy.networkhexause python311Packages or python312Packages instead19:30:56
@glepage:matrix.orgGaétan Lepage SomeoneSerge (UTC+3) what is your opinion on merging the torch update as is ? 19:47:12
@glepage:matrix.orgGaétan LepageI am pretty confident in the absence of regression in this PR19:48:03
@connorbaker:matrix.orgconnor (he/him) Gaétan Lepage: have you had a chance to try training a model with torch.compile? 20:27:04
@connorbaker:matrix.orgconnor (he/him) I've been testing with nix run -L --override-input nixpkgs github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd .#nix-cuda-test run inside https://github.com/ConnorBaker/nix-cuda-test 20:28:56
@connorbaker:matrix.orgconnor (he/him)
$ nix run -L --override-input nixpkgs github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd .#nix-cuda-test -- --compile
warning: not writing modified lock file of flake 'git+file:///home/connorbaker/nix-cuda-test':
• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/593754412bff02f735ba339d7a3afda41ad19bb5?narHash=sha256-a%2BVM3UnER9KOFZBPjIin3ojO1h3m4NzR9y8wwLka6oQ%3D' (2024-06-09)
  → 'github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd?narHash=sha256-EQDc%2BmcEQG7Q1PzZKikAnX5YtAHT/KjFR773m48L7m0%3D' (2024-06-09)
Seed set to 42
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Files already downloaded and verified
Files already downloaded and verified
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type             | Params
-----------------------------------------------
0 | criterion | CrossEntropyLoss | 0     
1 | model     | ViT              | 86.3 M
-----------------------------------------------
86.3 M    Trainable params
0         Non-trainable params
86.3 M    Total params
345.317   Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|                                                                                                                                                           | 0/2 [00:00<?, ?it/s]ldconfig: Can't open cache file /nix/store/apab5i73dqa09wx0q27b6fbhd1r18ihl-glibc-2.39-31/etc/ld.so.cache
: No such file or directory

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
.nix-cuda-test-wrapped 9 <module>
sys.exit(main())

__main__.py 126 main
trainer.fit(

trainer.py 544 fit
call._call_and_handle_interrupt(

call.py 44 _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)

trainer.py 580 _fit_impl
self._run(model, ckpt_path=ckpt_path)

trainer.py 987 _run
results = self._run_stage()

trainer.py 1031 _run_stage
self._run_sanity_check()

trainer.py 1060 _run_sanity_check
val_loop.run()

utilities.py 182 _decorator
return loop_run(self, *args, **kwargs)

evaluation_loop.py 135 run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)

evaluation_loop.py 396 _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)

call.py 309 _call_strategy_hook
output = fn(*args, **kwargs)

strategy.py 412 validation_step
return self.lightning_module.validation_step(*args, **kwargs)

eval_frame.py 451 _fn
return fn(*args, **kwargs)

convert_frame.py 921 catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)

convert_frame.py 786 _convert_frame
result = inner_convert(

convert_frame.py 400 _convert_frame_assert
return _compile(

contextlib.py 81 inner
return func(*args, **kwds)

convert_frame.py 676 _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

convert_frame.py 535 compile_inner
out_code = transform_code_object(code, transform)

bytecode_transformation.py 1036 transform_code_object
transformations(instructions, code_options)

convert_frame.py 165 _fn
return fn(*args, **kwargs)

convert_frame.py 500 transform
tracer.run()

symbolic_convert.py 2149 run
super().run()

symbolic_convert.py 810 run
and self.step()

symbolic_convert.py 773 step
getattr(self, inst.opname)(inst)

symbolic_convert.py 484 wrapper
return handle_graph_break(self, inst, speculation.reason)

symbolic_convert.py 548 handle_graph_break
self.output.compile_subgraph(self, reason=reason)

output_graph.py 1001 compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)

contextlib.py 81 inner
return func(*args, **kwds)

output_graph.py 1178 compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

output_graph.py 1251 call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(

output_graph.py 1232 call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())

after_dynamo.py 117 debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)

__init__.py 1731 __call__
return compile_fx(model_, inputs_, config_patches=self.config)

contextlib.py 81 inner
return func(*args, **kwds)

compile_fx.py 1330 compile_fx
return aot_autograd(

common.py 58 compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)

aot_autograd.py 903 aot_module_simplified
compiled_fn = create_aot_dispatcher_function(

utils.py 262 time_wrapper
r = func(*args, **kwargs)

aot_autograd.py 628 create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)

runtime_wrappers.py 443 aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)

runtime_wrappers.py 648 aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)

jit_compile_runtime_wrappers.py 119 aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

compile_fx.py 1257 fw_compiler_base
return inner_compile(

after_aot.py 83 debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)

debug.py 304 inner
return fn(*args, **kwargs)

contextlib.py 81 inner
return func(*args, **kwds)

contextlib.py 81 inner
return func(*args, **kwds)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

compile_fx.py 438 compile_fx_inner
compiled_graph = fx_codegen_and_compile(

compile_fx.py 714 fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()

graph.py 1307 compile_to_fn
return self.compile_to_module().call

utils.py 262 time_wrapper
r = func(*args, **kwargs)

graph.py 1250 compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()

graph.py 1208 codegen
self.scheduler.codegen()

utils.py 262 time_wrapper
r = func(*args, **kwargs)

scheduler.py 2339 codegen
self.get_backend(device).codegen_nodes(node.get_nodes())  # type: ignore[possibly-undefined]

cuda_combined_scheduling.py 63 codegen_nodes
return self._triton_scheduling.codegen_nodes(nodes)

triton.py 3255 codegen_nodes
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)

triton.py 3425 codegen_node_schedule
src_code = kernel.codegen_kernel()

triton.py 2753 codegen_kernel
"backend_hash": torch.utils._triton.triton_hash_with_backend(),

_triton.py 101 triton_hash_with_backend
backend_hash = triton_backend_hash()

_triton.py 37 triton_backend_hash
from triton.common.backend import get_backend, get_cuda_version_key

torch._dynamo.exc.BackendCompilerFailed:
backend='inductor' raised:
ImportError: cannot import name 'get_cuda_version_key' from 'triton.common.backend' (/nix/store/4pd9qb5sd865n8nms3vadx83kzzr6i8v-python3.11-triton-2.1.0/lib/python3.11/site-packages/triton/common/backend.py)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
20:30:06
@glepage:matrix.orgGaétan LepageNope I haven't tried20:36:00
@glepage:matrix.orgGaétan LepageIs this with my branch or from master ?20:36:12

Show newer messages


Back to Room ListRoom Version: 9