| 19 Jun 2026 |
magic_rb | What is -U? | 16:15:05 |
ElvishJerricco | --user, make a user namespace | 16:15:13 |
magic_rb | Ah | 16:15:16 |
K900 | Honestly I'd probably not submit this without a patch | 16:15:19 |
magic_rb | Yeah im looking at a patch, reading how to do rtkit | 16:15:33 |
magic_rb | Doesnt look that hard | 16:15:35 |
magic_rb | Ill write smth and open a draft PR to show i made an effort | 16:15:46 |
K900 | But user doesn't have cap_sys_nice normally | 16:17:11 |
ElvishJerricco | doesn't matter | 16:17:21 |
ElvishJerricco | when you make a user namespace, that namespace has all caps | 16:17:31 |
magic_rb | Not cap_sys_admin? Or even that | 16:17:45 |
magic_rb | What | 16:17:46 |
ElvishJerricco | those caps just end up being restricted in kernel logic to not do things to escape the original caps | 16:17:52 |
magic_rb | How can this shit be so fucking complicated and unintuitive | 16:17:55 |
ElvishJerricco | even that | 16:18:00 |
ElvishJerricco | e.g. | 16:18:08 |
ElvishJerricco | the reason you can make mounts in a user namespace without CAP_SYS_ADMIN outside the namespace is because the user namespace allows you to make a mount namespace. So you make the user namespace, that namespace has CAP_SYS_ADMIN. You cannot use this CAP_SYS_ADMIN to make mounts yet, because that CAP_SYS_ADMIN is not allowed to make mounts in mount namespaces from its parent user namespace. So you make a new mount namespace, which user namespaces are allowed to do, and because it was made in your user namespace, and because you have CAP_SYS_ADMIN in that user namespace, you're allowed to make mounts in that mount namespace | 16:20:16 |
ElvishJerricco | i.e. the same CAP_SYS_ADMIN has different capabilities depending on whether your userns owns the thing you're trying to use it on | 16:21:08 |
ElvishJerricco | so you can definitely just gain CAP_SYS_NICE | 16:21:28 |
ElvishJerricco | but for that to be useful, the kernel has to have some internal logic about things your userns owns that CAP_SYS_NICE is allowed to operate on | 16:21:57 |
ElvishJerricco | IIUC it's pretty normal for linux caps to have no such logic and just reduce to "after scoping back to the init namespace, what cap remains?" | 16:22:51 |
ElvishJerricco | (oh also mounting additionally has the constraint that you can only make mounts for allowed file systems in a non-init-userns, which currently only includes things like tmpfs and overlayfs) | 16:24:16 |
ElvishJerricco | * (oh also mounting additionally has the constraint that a non-init-uersns can only make mounts for allowed file systems, which currently only includes things like tmpfs and overlayfs) | 16:24:59 |
magic_rb | Jfc this is complicated, but a patch for cap_sys_nice could then be made, if upstream wanted it and i knew how right | 16:27:27 |
ElvishJerricco | you'd have to define (or maybe find documentation on how it's defined) how CAP_SYS_NICE plays together with userns. Like what does the userns own that CAP_SYS_NICE can operate on, because that criteria is how you make it safe | 16:28:43 |
magic_rb | I mean id guess it would be "the userns must have created its own pid namespace. Any pid originating in that namespace is fair game. But obviously i know jack shit about this. Ill look at the rtkit way. Doesnt seem that hard | 16:30:17 |
magic_rb | It would be nice to have in general and probably required on the frame. Otherwise we'll have frame timing issued | 16:30:39 |
ElvishJerricco | yea I'm only explaining my knowledge of userns and caps in general, I have absolutely no clue about this RT / NICE stuff :P | 16:31:00 |
magic_rb | Yeah same, probably less than you :P | 16:31:31 |
ElvishJerricco | oh, this reminded me of something fun:
touch foo
chmod 0400 foo
echo fails > foo # Permission denied
echo works | unshare -c --keep-caps tee foo
You can just write to readonly files unprivileged because you have CAP_DAC_OVERRIDE :)
| 16:48:35 |