!xmLtiCaAJxfhURjrXl:matrix.org

NixOS RISC-V

238 Members
NixOS on RISC-V https://wiki.nixos.org/wiki/RISC-V https://pad.lassul.us/NixOS-riscv64-linux 72 Servers

Load older messages


SenderMessageTime
19 Feb 2024
* @skeuchel:matrix.orgSteven Keuchel is trying to get milkv pioneer into nixos-hardware https://github.com/sophgo/bootloader-riscv/issues/7210:28:18
@thefossguy:matrix.orgPratham Patel (you can mention me)FYI: Please don't use the Pioneer or the Lichee 4A in any of the build machines. The C<something> cores from T-Head have hardware erratas about incorrect Atomic implementation and it should only be used for "oh cool thing" instad of a build machine.10:37:32
@thefossguy:matrix.orgPratham Patel (you can mention me)Also, the above "PSA" was for using these machines for building nixpkgs. Not against their inclusion in nixos-hardware :)10:54:37
@thefossguy:matrix.orgPratham Patel (you can mention me)So far, the only machines I trust for building nixpkgs are the HiFive Unmatched and the VisionFive 2.10:55:27
@thefossguy:matrix.orgPratham Patel (you can mention me) Might add the Milk-V Oasis to that list once it's launched, since it uses SiFive's cores (which I trust to have no major issues), but only after testing things out :) 10:56:12
@skeuchel:matrix.orgSteven KeuchelDo you have any specific silicon erratas you are referring to? I'm only interested in those that impact user-mode functionality, rather than being solely kernel-mode concerns. I know about the linux ERRATA_THEAD_QSPINLOCK, but I would like to know about others, if they exist. The information is a bit hard to find.11:35:18
@thefossguy:matrix.orgPratham Patel (you can mention me) All the RISC-V related erratas that I know: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/asm/errata_list.h 11:38:14
@skeuchel:matrix.orgSteven Keuchelahh ok. as far as i understand it, the current erratas in upstream linux do not affect user-mode.11:44:43
@skeuchel:matrix.orgSteven Keuchelbut correct me if i'm wrong, I would really like to know ;)11:44:55
@thefossguy:matrix.orgPratham Patel (you can mention me)that's for the core though, pretty sure both kernel-space and user-space are affected11:45:17
@thefossguy:matrix.orgPratham Patel (you can mention me)also, these erratas are errors in implementing the RISC-V spec in the actual Linux-capable core, individual or on SoC11:50:05
@thefossguy:matrix.orgPratham Patel (you can mention me) That said, I do wish to buy that Lichee 4A cluster board with 7 slots so I can parallel build to check for build failures since we're still bootstrapping. Once a build succeeds, I'd build it on the VF2 that I own. 11:54:03
@skeuchel:matrix.orgSteven Keuchel
In reply to @thefossguy:matrix.org
also, these erratas are errors in implementing the RISC-V spec in the actual Linux-capable core, individual or on SoC

Yes, the behaviour may differ from the official spec, but you still have to consider the impact.

  • ERRATA_THEAD_PBMT is about page tables, it's kernel-mode specific
  • ERRATA_THEAD_PMU is about performance counter overflows
  • ERRATA_THEAD_CMO I don't understand enough, but you can also disable those extensions
  • ERRATA_THEAD_QSPINLOCK is a case where the vendor implementation is more restricted than what the spec requires
    So the question is if these warrant not trusting the machines as build boxes. For me, these erratas are not major issues.
12:15:49
@skeuchel:matrix.orgSteven KeuchelThat being said, I would trust none of these as production machines for a database or something.12:16:06
@skeuchel:matrix.orgSteven KeuchelFor the pioneer: the sg2042-dev vendor kernel is not stable, and the PCIe support is a bit wonky which I would consider as a bigger issue at this point. But hopefully those can be resolved.12:17:20
@thefossguy:matrix.orgPratham Patel (you can mention me)Agreed, while the kernel does handle most of this to have workarounds, they're exactly what they are, workarounds. :)12:36:35
@sorear:matrix.orgsorearthe pioneer exhibits extremely bad performance for some multithreaded workloads, there was some speculation that it's caused by ERRATA_THEAD_WRITE_ONCE but experiments were inconclusive13:22:40
@sorear:matrix.orgsoreart-head has a userspace erratum where the FP underflow flag is incorrectly not set if a multiply rounds away from zero to produce the smallest normal number, AFAIK the only thing this affects is the glibc regression test suite13:23:49
@thefossguy:matrix.orgPratham Patel (you can mention me) That's because it's not a serious chip (not in a negative sense). It's an experiment on checking what works instead of making it work "correctly". Like a lot of first-gen products, it's a flex of "can we do this?" without the "nicely" part. 13:24:18
@sorear:matrix.orgsoreart-head cores do not allow vector loads/stores to strongly ordered (I/O) memory, so no memcpy to frame buffers13:24:43
@thefossguy:matrix.orgPratham Patel (you can mention me)Their vector is also sorta custom and/or 0.7.113:25:23
@sorear:matrix.orgsorearthey also mis-implemented fence decoding and report illegal instruction extensions for many valid fences13:26:05
@sorear:matrix.orgsorearthe last one also applies to c908 which has nominally standard 1.0 vectors13:26:19
@sorear:matrix.orgsorearthose four are the only userspace-breaking errata I currently know about and none of them have the potential to cause miscompiles. fedora is building some packages on a pioneer box13:27:04
@thefossguy:matrix.orgPratham Patel (you can mention me)huh... AFAIK, David explicitly mentioned not using it as a build box13:28:18
@sorear:matrix.orgsorearthat said the th1520 is slower in practice on compilation workloads than vf2 and the pioneer is only slightly faster some of the time13:28:28
@sorear:matrix.orgsoreari think they're only using it situationally, one of them mentioned using it for ghc recently13:29:08
@thefossguy:matrix.orgPratham Patel (you can mention me)
In reply to @sorear:matrix.org
that said the th1520 is slower in practice on compilation workloads than vf2 and the pioneer is only slightly faster some of the time
this is what the Lichee 4A uses, right?
13:30:27
@sorear:matrix.orgsoreari've been calling the situation "new golden age of processor errata" for years, i just don't see a need to single out t-head and I'm paranoidly wondering whether they get held to a different standard because of geopolitics13:30:48
@sorear:matrix.orgsorearyes (lp4a=th1520)13:30:56

Show newer messages


Back to Room ListRoom Version: 10