NixOS CUDA | 289 Members | |
| CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda | 57 Servers |
| Sender | Message | Time |
|---|---|---|
| 20 Dec 2024 | ||
| https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/conv.py | 13:15:03 | |
| ultimately it complains about this | 13:15:07 | |
| No VM tests, no | 13:19:16 | |
| This is the first one I'm trying to execute entirely on the CPU | 13:20:32 | |
| for comfyui in particular | 13:20:39 | |
Download image.png | 13:21:14 | |
| I have this cat that I can reproduce on the host cpu in 13 seconds only | 13:21:17 | |
comfyui is launched with --cpu but maybe that is incomplete | 13:21:31 | |
| Maybe it secretly still accesses the GPU and this vm test proves it | 13:21:41 | |
| Plausible, I suppose pytorch could ignore our flags and build something with vector extensions on (unless cc-wrapper filters those, I'm not sure), but what part of the logs suggested this conclusion? Searching for "qemu avx" I see https://superuser.com/a/454814 suggesting | 13:23:54 | |
| oh i'm acting like an llm | 13:24:16 | |
| Yeah I've done all of that, and lspcu inside the vm shows
| 13:24:36 | |
| so I supposedly have it | 13:24:44 | |
I've tried a lot of -cpu options too | 13:26:40 | |
| maybe there's a PYTORCH_VAR I can set? | 13:26:59 | |
| 13:38:07 | |
| Nothing, just other people's reports online | 13:38:17 | |
| That are now lost to my browser history | 13:40:38 | |
| This same stuff works in the host though | 13:44:01 | |
Host hasfpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_goo d nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_l m cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cd p_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthr eshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es | 13:51:08 | |
guest has:fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m ca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cp uid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t imer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_ legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefe tch osvw perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflus hopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsavee rptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_c lean flushbyasid pausefilter pfthreshold v_vmsave_vmlo ad vgif umip rdpid overflow_recov succor arch_capabili ties | 13:51:14 | |
| cqm doesn't seem to exist in qemu, what is it? | 13:55:19 | |
| SomeoneSerge (utc+3): Ah here it was https://discuss.pytorch.org/t/runtimeerror-could-not-create-a-primitive/117519 | 14:03:07 | |
| Oh wow.. I made it work SomeoneSerge (utc+3) | 14:22:02 | |
| it was some systemd hardening feature causing it | 14:22:10 | |
| It is one of these, but we do not know which one it is
| 14:24:04 | |
| now comes the bisecting | 14:28:29 | |
It appears it was WriteMemoryDenyExecute causing it | 14:35:34 | |
| https://github.com/pytorch/pytorch/issues/143651 | 14:54:06 | |
| Made an issue in pytorch anyway | 14:54:10 | |