| 20 Dec 2024 |
SomeoneSerge (back on matrix) | oh i'm acting like an llm | 13:24:16 |
matthewcroughan | Yeah I've done all of that, and lspcu inside the vm shows
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: AuthenticAMD
BIOS Vendor ID: QEMU
Model name: AMD Ryzen 9 3900X 12-Core Processor
BIOS Model name: pc-i440fx-9.1 CPU @ 2.0GHz
BIOS CPU family: 1
CPU family: 23
Model: 113
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
BogoMIPS: 7599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
ca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall
nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cp
uid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma
cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t
imer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_
legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefe
tch osvw perfctr_core ssbd ibpb stibp vmmcall fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflus
hopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsavee
rptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_c
lean flushbyasid pausefilter pfthreshold v_vmsave_vmlo
ad vgif umip rdpid overflow_recov succor arch_capabili
ties
Virtualization features:
| 13:24:36 |
matthewcroughan | so I supposedly have it | 13:24:44 |
matthewcroughan | I've tried a lot of -cpu options too | 13:26:40 |
matthewcroughan | maybe there's a PYTORCH_VAR I can set? | 13:26:59 |
SomeoneSerge (back on matrix) |
avx ... but what part of the logs suggested this conclusion?
| 13:38:07 |
matthewcroughan | Nothing, just other people's reports online | 13:38:17 |
matthewcroughan | That are now lost to my browser history | 13:40:38 |
matthewcroughan | This same stuff works in the host though | 13:44:01 |
matthewcroughan | Host has
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_goo d nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_l m cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cd p_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthr eshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es | 13:51:08 |
matthewcroughan | guest has:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m ca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cp uid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t imer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_ legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefe tch osvw perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflus hopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsavee rptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_c lean flushbyasid pausefilter pfthreshold v_vmsave_vmlo ad vgif umip rdpid overflow_recov succor arch_capabili ties | 13:51:14 |
matthewcroughan | cqm doesn't seem to exist in qemu, what is it? | 13:55:19 |
matthewcroughan | SomeoneSerge (utc+3): Ah here it was https://discuss.pytorch.org/t/runtimeerror-could-not-create-a-primitive/117519 | 14:03:07 |
matthewcroughan | Oh wow.. I made it work SomeoneSerge (utc+3) | 14:22:02 |
matthewcroughan | it was some systemd hardening feature causing it | 14:22:10 |
matthewcroughan | It is one of these, but we do not know which one it is
DevicePolicy = "closed";
LockPersonality = true;
MemoryDenyWriteExecute = true;
NoNewPrivileges = true;
PrivateDevices = false; # hides acceleration devices
PrivateTmp = true;
PrivateUsers = true;
ProcSubset = "all"; # /proc/meminfo
ProtectClock = true;
ProtectControlGroups = true;
ProtectHome = true;
ProtectHostname = true;
ProtectKernelLogs = true;
ProtectKernelModules = true;
ProtectKernelTunables = true;
ProtectProc = "invisible";
ProtectSystem = "strict";
RemoveIPC = true;
RestrictNamespaces = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
RestrictAddressFamilies = [
"AF_INET"
"AF_INET6"
"AF_UNIX"
];
SupplementaryGroups = [ "render" ]; # for rocm to access /dev/dri/renderD* devices
SystemCallArchitectures = "native";
SystemCallFilter = [
"@system-service @resources"
"~@privileged"
];
UMask = "0077";
| 14:24:04 |
matthewcroughan | now comes the bisecting | 14:28:29 |
matthewcroughan | It appears it was WriteMemoryDenyExecute causing it | 14:35:34 |
matthewcroughan | https://github.com/pytorch/pytorch/issues/143651 | 14:54:06 |
matthewcroughan | Made an issue in pytorch anyway | 14:54:10 |
SomeoneSerge (back on matrix) | Nice. Maybe it's some kind of JIT stuff? | 14:54:41 |
matthewcroughan | Yeah, I disabled JIT I thought with a var | 14:54:52 |
matthewcroughan | who knows | 14:54:53 |
matthewcroughan | PYTORCH_JIT = "0"; | 14:55:08 |
matthewcroughan | I thought this had worked, maybe it had no impact for another reason | 14:55:15 |
matthewcroughan | it seems to completely work with this turned off though | 14:56:43 |
matthewcroughan | Oh this is *so good SomeoneSerge (utc+3) | 15:03:04 |
matthewcroughan | * Oh this is so good SomeoneSerge (utc+3) | 15:03:11 |
matthewcroughan |  Download image.png | 15:03:34 |
matthewcroughan | Stable Diffusion Comfyui Pytorch CPU based fast VMtest achieved | 15:03:49 |