| 19 Feb 2024 |
Steven Keuchel | ahh ok. as far as i understand it, the current erratas in upstream linux do not affect user-mode. | 11:44:43 |
Steven Keuchel | but correct me if i'm wrong, I would really like to know ;) | 11:44:55 |
Pratham Patel (you can mention me) | that's for the core though, pretty sure both kernel-space and user-space are affected | 11:45:17 |
Pratham Patel (you can mention me) | also, these erratas are errors in implementing the RISC-V spec in the actual Linux-capable core, individual or on SoC | 11:50:05 |
Pratham Patel (you can mention me) | That said, I do wish to buy that Lichee 4A cluster board with 7 slots so I can parallel build to check for build failures since we're still bootstrapping. Once a build succeeds, I'd build it on the VF2 that I own. | 11:54:03 |
Steven Keuchel | In reply to @thefossguy:matrix.org also, these erratas are errors in implementing the RISC-V spec in the actual Linux-capable core, individual or on SoC Yes, the behaviour may differ from the official spec, but you still have to consider the impact.
- ERRATA_THEAD_PBMT is about page tables, it's kernel-mode specific
- ERRATA_THEAD_PMU is about performance counter overflows
- ERRATA_THEAD_CMO I don't understand enough, but you can also disable those extensions
- ERRATA_THEAD_QSPINLOCK is a case where the vendor implementation is more restricted than what the spec requires
So the question is if these warrant not trusting the machines as build boxes. For me, these erratas are not major issues.
| 12:15:49 |
Steven Keuchel | That being said, I would trust none of these as production machines for a database or something. | 12:16:06 |
Steven Keuchel | For the pioneer: the sg2042-dev vendor kernel is not stable, and the PCIe support is a bit wonky which I would consider as a bigger issue at this point. But hopefully those can be resolved. | 12:17:20 |
Pratham Patel (you can mention me) | Agreed, while the kernel does handle most of this to have workarounds, they're exactly what they are, workarounds. :) | 12:36:35 |
sorear | the pioneer exhibits extremely bad performance for some multithreaded workloads, there was some speculation that it's caused by ERRATA_THEAD_WRITE_ONCE but experiments were inconclusive | 13:22:40 |
sorear | t-head has a userspace erratum where the FP underflow flag is incorrectly not set if a multiply rounds away from zero to produce the smallest normal number, AFAIK the only thing this affects is the glibc regression test suite | 13:23:49 |
Pratham Patel (you can mention me) | That's because it's not a serious chip (not in a negative sense). It's an experiment on checking what works instead of making it work "correctly". Like a lot of first-gen products, it's a flex of "can we do this?" without the "nicely" part. | 13:24:18 |
sorear | t-head cores do not allow vector loads/stores to strongly ordered (I/O) memory, so no memcpy to frame buffers | 13:24:43 |
Pratham Patel (you can mention me) | Their vector is also sorta custom and/or 0.7.1 | 13:25:23 |
sorear | they also mis-implemented fence decoding and report illegal instruction extensions for many valid fences | 13:26:05 |
sorear | the last one also applies to c908 which has nominally standard 1.0 vectors | 13:26:19 |
sorear | those four are the only userspace-breaking errata I currently know about and none of them have the potential to cause miscompiles. fedora is building some packages on a pioneer box | 13:27:04 |
Pratham Patel (you can mention me) | huh... AFAIK, David explicitly mentioned not using it as a build box | 13:28:18 |
sorear | that said the th1520 is slower in practice on compilation workloads than vf2 and the pioneer is only slightly faster some of the time | 13:28:28 |
sorear | i think they're only using it situationally, one of them mentioned using it for ghc recently | 13:29:08 |
Pratham Patel (you can mention me) | In reply to @sorear:matrix.org that said the th1520 is slower in practice on compilation workloads than vf2 and the pioneer is only slightly faster some of the time this is what the Lichee 4A uses, right? | 13:30:27 |
sorear | i've been calling the situation "new golden age of processor errata" for years, i just don't see a need to single out t-head and I'm paranoidly wondering whether they get held to a different standard because of geopolitics | 13:30:48 |
sorear | yes (lp4a=th1520) | 13:30:56 |
Pratham Patel (you can mention me) | probably | 13:31:17 |
Steven Keuchel | In reply to @sorear:matrix.org t-head has a userspace erratum where the FP underflow flag is incorrectly not set if a multiply rounds away from zero to produce the smallest normal number, AFAIK the only thing this affects is the glibc regression test suite thanks for that! I am going to read up on it. | 13:44:25 |
sorear | In reply to @skeuchel:matrix.org thanks for that! I am going to read up on it. https://github.com/revyos/revyos/issues/17 fwiw | 13:46:22 |
Alex | In reply to @thefossguy:matrix.org So far, the only machines I trust for building nixpkgs are the HiFive Unmatched and the VisionFive 2. You should also count the Star64, because it uses the same SoC as the VF2 (unless you have a reason for excluding it too?) | 14:33:01 |
Pratham Patel (you can mention me) | Ah no reason to exclude it, just forgot about it! :D | 14:33:26 |
Alex | On the topic of implementation errata, I wonder if anyone's doing any exhaustive testing on the various RISC-V processor designs... | 14:34:10 |
Pratham Patel (you can mention me) | there are a million (hyperbole) Indian startups doing exactly that! | 14:34:52 |