| 17 May 2025 |
Alex | In reply to @lotte:chir.rs i have run a small benchmark earlier today and their perf figures are just Wrong Wrong as in worse than advertised?
Could be a difference in flags if you're not enabling _Zba_Zbb and PGO. | 17:03:37 |
@lotte:chir.rs | yeah tbf i didn’t enable Zba_Zbb | 17:04:06 |
@lotte:chir.rs | but the raccode i was testing did not make use of those instructions | 17:04:29 |
dramforever | In reply to @lotte:chir.rs on the kickstarter they raccompared themselves to the pi 4 ... it's just not possible | 17:04:50 |
dramforever | the 4 cores on the pi4 are simply way better | 17:05:13 |
Alex | Maybe they were just comparing the clock rates?
(Very scientific benchmark that.) | 17:05:50 |
dramforever | both are 1.5GHz | 17:08:00 |
dramforever | i went to check the kickstarter page and it claims to be a chunk slower than the pi4 | 17:08:18 |
Alex | Exactly. Comparable performance. /s | 17:08:24 |
dramforever | which may be true | 17:08:25 |
@lotte:chir.rs | the first test was 128 bit ID generation (nanosecond system clock, 128 bit multiply, and an atomic memory raccess), the cm4 did it in 140ns average, the vf2 in 1.114μs average (~8x slower)
1MB of base64 encoding? 4.383ms on the cm4, 12.122ms on the vf2 (~3x slower) | 17:08:38 |
dramforever | it also just does not have vector instructions | 17:09:00 |
@lotte:chir.rs | lemme try openssl speed | 17:09:02 |
dramforever | so base64 checks out depending on implementation | 17:09:15 |
@lotte:chir.rs | the rust base64 raccrate | 17:09:23 |
dramforever | i have no idea what this compiles down to | 17:14:29 |
dramforever | but anyway don't expect to be able to replicate the speed ration on random benchmarks | 17:15:29 |
dramforever | * but anyway don't expect to be able to replicate the speed ratio on random benchmarks | 17:15:40 |
dramforever | not happening | 17:16:16 |
dramforever | as of uuid generation i wonder if it's going through vdso on arm and a syscall on riscv | 17:17:57 |
@lotte:chir.rs |
Doing 4096 bits private rsa sign ops for 10s: 142 4096 bits private RSA sign ops in 9.93s
| 17:22:52 |
@lotte:chir.rs | so fast | 17:22:53 |
dramforever | yeah not having vector instructions really helps | 17:23:48 |
@lotte:chir.rs | results are in ✨ | 17:30:16 |
@lotte:chir.rs | https://docs.google.com/spreadsheets/d/1xvuzBbQaWIGIrmKiYEHSkABQkhmKc8WNKTZWFyVgvqA/edit?usp=sharing | 17:30:17 |
@lotte:chir.rs | in general it seems that the single raccore performance of the vf2 is about half that of the cm4 | 17:30:45 |
@lotte:chir.rs | presumably both have working hardware aes, but the sha256 implementation seems to be 3x slower than on the cm4 | 17:31:40 |
Alex | Assuming the system is properly configured to use the hardware crypto (on either system). | 17:32:24 |
@lotte:chir.rs | default settings | 17:33:03 |
@lotte:chir.rs | the cm4 is running kde wayland and yes it is not that fast ™️ | 17:33:33 |