!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

312 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda60 Servers

Load older messages


SenderMessageTime
9 Jun 2024
@glepage:matrix.orgGaétan Lepage SomeoneSerge (UTC+3) what is your opinion on merging the torch update as is ? 19:47:12
@glepage:matrix.orgGaétan LepageI am pretty confident in the absence of regression in this PR19:48:03
@connorbaker:matrix.orgconnor (he/him) Gaétan Lepage: have you had a chance to try training a model with torch.compile? 20:27:04
@connorbaker:matrix.orgconnor (he/him) I've been testing with nix run -L --override-input nixpkgs github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd .#nix-cuda-test run inside https://github.com/ConnorBaker/nix-cuda-test 20:28:56
@connorbaker:matrix.orgconnor (he/him)
$ nix run -L --override-input nixpkgs github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd .#nix-cuda-test -- --compile
warning: not writing modified lock file of flake 'git+file:///home/connorbaker/nix-cuda-test':
• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/593754412bff02f735ba339d7a3afda41ad19bb5?narHash=sha256-a%2BVM3UnER9KOFZBPjIin3ojO1h3m4NzR9y8wwLka6oQ%3D' (2024-06-09)
  → 'github:nixos/nixpkgs/6f0e1545adfa64c9f3a22f5ce789b9f509080abd?narHash=sha256-EQDc%2BmcEQG7Q1PzZKikAnX5YtAHT/KjFR773m48L7m0%3D' (2024-06-09)
Seed set to 42
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Files already downloaded and verified
Files already downloaded and verified
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type             | Params
-----------------------------------------------
0 | criterion | CrossEntropyLoss | 0     
1 | model     | ViT              | 86.3 M
-----------------------------------------------
86.3 M    Trainable params
0         Non-trainable params
86.3 M    Total params
345.317   Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|                                                                                                                                                           | 0/2 [00:00<?, ?it/s]ldconfig: Can't open cache file /nix/store/apab5i73dqa09wx0q27b6fbhd1r18ihl-glibc-2.39-31/etc/ld.so.cache
: No such file or directory

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
.nix-cuda-test-wrapped 9 <module>
sys.exit(main())

__main__.py 126 main
trainer.fit(

trainer.py 544 fit
call._call_and_handle_interrupt(

call.py 44 _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)

trainer.py 580 _fit_impl
self._run(model, ckpt_path=ckpt_path)

trainer.py 987 _run
results = self._run_stage()

trainer.py 1031 _run_stage
self._run_sanity_check()

trainer.py 1060 _run_sanity_check
val_loop.run()

utilities.py 182 _decorator
return loop_run(self, *args, **kwargs)

evaluation_loop.py 135 run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)

evaluation_loop.py 396 _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)

call.py 309 _call_strategy_hook
output = fn(*args, **kwargs)

strategy.py 412 validation_step
return self.lightning_module.validation_step(*args, **kwargs)

eval_frame.py 451 _fn
return fn(*args, **kwargs)

convert_frame.py 921 catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)

convert_frame.py 786 _convert_frame
result = inner_convert(

convert_frame.py 400 _convert_frame_assert
return _compile(

contextlib.py 81 inner
return func(*args, **kwds)

convert_frame.py 676 _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

convert_frame.py 535 compile_inner
out_code = transform_code_object(code, transform)

bytecode_transformation.py 1036 transform_code_object
transformations(instructions, code_options)

convert_frame.py 165 _fn
return fn(*args, **kwargs)

convert_frame.py 500 transform
tracer.run()

symbolic_convert.py 2149 run
super().run()

symbolic_convert.py 810 run
and self.step()

symbolic_convert.py 773 step
getattr(self, inst.opname)(inst)

symbolic_convert.py 484 wrapper
return handle_graph_break(self, inst, speculation.reason)

symbolic_convert.py 548 handle_graph_break
self.output.compile_subgraph(self, reason=reason)

output_graph.py 1001 compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)

contextlib.py 81 inner
return func(*args, **kwds)

output_graph.py 1178 compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

output_graph.py 1251 call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(

output_graph.py 1232 call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())

after_dynamo.py 117 debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)

__init__.py 1731 __call__
return compile_fx(model_, inputs_, config_patches=self.config)

contextlib.py 81 inner
return func(*args, **kwds)

compile_fx.py 1330 compile_fx
return aot_autograd(

common.py 58 compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)

aot_autograd.py 903 aot_module_simplified
compiled_fn = create_aot_dispatcher_function(

utils.py 262 time_wrapper
r = func(*args, **kwargs)

aot_autograd.py 628 create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)

runtime_wrappers.py 443 aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)

runtime_wrappers.py 648 aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)

jit_compile_runtime_wrappers.py 119 aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

compile_fx.py 1257 fw_compiler_base
return inner_compile(

after_aot.py 83 debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)

debug.py 304 inner
return fn(*args, **kwargs)

contextlib.py 81 inner
return func(*args, **kwds)

contextlib.py 81 inner
return func(*args, **kwds)

utils.py 262 time_wrapper
r = func(*args, **kwargs)

compile_fx.py 438 compile_fx_inner
compiled_graph = fx_codegen_and_compile(

compile_fx.py 714 fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()

graph.py 1307 compile_to_fn
return self.compile_to_module().call

utils.py 262 time_wrapper
r = func(*args, **kwargs)

graph.py 1250 compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()

graph.py 1208 codegen
self.scheduler.codegen()

utils.py 262 time_wrapper
r = func(*args, **kwargs)

scheduler.py 2339 codegen
self.get_backend(device).codegen_nodes(node.get_nodes())  # type: ignore[possibly-undefined]

cuda_combined_scheduling.py 63 codegen_nodes
return self._triton_scheduling.codegen_nodes(nodes)

triton.py 3255 codegen_nodes
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)

triton.py 3425 codegen_node_schedule
src_code = kernel.codegen_kernel()

triton.py 2753 codegen_kernel
"backend_hash": torch.utils._triton.triton_hash_with_backend(),

_triton.py 101 triton_hash_with_backend
backend_hash = triton_backend_hash()

_triton.py 37 triton_backend_hash
from triton.common.backend import get_backend, get_cuda_version_key

torch._dynamo.exc.BackendCompilerFailed:
backend='inductor' raised:
ImportError: cannot import name 'get_cuda_version_key' from 'triton.common.backend' (/nix/store/4pd9qb5sd865n8nms3vadx83kzzr6i8v-python3.11-triton-2.1.0/lib/python3.11/site-packages/triton/common/backend.py)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
20:30:06
@glepage:matrix.orgGaétan LepageNope I haven't tried20:36:00
@glepage:matrix.orgGaétan LepageIs this with my branch or from master ?20:36:12
@connorbaker:matrix.orgconnor (he/him)This is with the latest commit from your branch20:50:29
@connorbaker:matrix.orgconnor (he/him)It works without compile, just curious if this is a problem with the PR20:50:43
@connorbaker:matrix.orgconnor (he/him)I'll try again with master to make sure it's not a regression20:51:08
@connorbaker:matrix.orgconnor (he/him)Cool, fails on master too20:51:57
@glepage:matrix.orgGaétan LepageNice ^^20:53:19
@glepage:matrix.orgGaétan LepageIs it OK for me to merge now ?20:53:44
@glepage:matrix.orgGaétan LepageOh, I've just seen your message20:53:58
10 Jun 2024
@mjolnir:nixos.orgNixOS Moderation Bot unbanned @jonringer:matrix.org@jonringer:matrix.org.00:17:14
@glepage:matrix.orgGaétan Lepageclipboard.png
Download clipboard.png
06:44:40
@glepage:matrix.orgGaétan LepageHaha botorch has probably taken ~11h but it succeeded X)06:44:56
@shekhinah:she.khinah.xyzshekhinah set their display name to yaldebaoth.11:02:59
@shekhinah:she.khinah.xyzshekhinah changed their display name from yaldebaoth to yaldabaoth.11:03:43
@connorbaker:matrix.orgconnor (he/him) Gaétan Lepage: did you mention there was a PR or something merged to disable the checkPhase or test suite for botorch, or did I misunderstand? 14:01:56
@connorbaker:matrix.orgconnor (he/him) On another note, has anyone built elpa (https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/elpa/default.nix) successfully with CUDA support? I let it run for like 20h and it was still building. Seems to compile four object files at a time? 14:04:03
@glepage:matrix.orgGaétan Lepage
In reply to @connorbaker:matrix.org
Gaétan Lepage: did you mention there was a PR or something merged to disable the checkPhase or test suite for botorch, or did I misunderstand?
No, I have not done anything. I was actually able to build it just fine from master earlier today.
14:29:01
@hexa:lossy.networkhexa Gaétan Lepage: have you considered pulling this patch for tensorflow-bin? https://github.com/tensorflow/tensorflow/issues/58073#issuecomment-2097055553 20:58:34
11 Jun 2024
@keiichi:matrix.orgteto when using localai 2.15 from unstable and even after a reboot I get ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library. It's a bit random but if anyone has a tip, I take it. nvidia-smi output looks fine 00:25:38
@glepage:matrix.orgGaétan Lepage
In reply to @hexa:lossy.network
Gaétan Lepage: have you considered pulling this patch for tensorflow-bin? https://github.com/tensorflow/tensorflow/issues/58073#issuecomment-2097055553
This looks like it could work !
However, how do you apply a patch to a wheel-type python derivation ?
06:38:47
@glepage:matrix.orgGaétan Lepage What phase of the buildPythonPackage script should I hook it into ? 06:39:02
@glepage:matrix.orgGaétan Lepage I tried patches = [ but it does not work 06:39:15
@glepage:matrix.orgGaétan Lepage

I am packaging this: https://github.com/EricLBuehler/mistral.rs?tab=readme-ov-file#installation-and-build
You can see that it support several variations for building (CUDA, metal, mkl...)

-> What should be the approach ? Adding cudaSupport ? metalSupport ? mklSupport ?

07:01:41
@kaya:catnip.eekaya 𖤐 changed their profile picture.08:03:48
@hexa:lossy.networkhexa
In reply to @glepage:matrix.org
This looks like it could work !
However, how do you apply a patch to a wheel-type python derivation ?
likely in postInstall 😕
11:58:18

Show newer messages


Back to Room ListRoom Version: 9