| 19 Jun 2023 |
raitobezarius | In reply to @ronnypfannschmidt:matrix.org right now there is no safe tooling for it, so its easy to mess up The stopgap solution for now is to have a global nixos module knob to disable automated migrations | 21:22:59 |
raitobezarius | Or to enforce a RFC saying that all nixos modules should have an automatic migration flag | 21:23:22 |
raitobezarius | During all my reviews, contributors usually don't want to bother and complain when I say this is a serious problem | 21:23:42 |
| 20 Jun 2023 |
@ronnypfannschmidt:matrix.org | all modules either get safe migrations or need a unsafe flag per module in peoples config | 08:11:22 |
@ronnypfannschmidt:matrix.org | then users can understand the quality levels by the unsafe migration flags they have to set | 08:11:59 |
@piegames:matrix.org | What is a "safe" migration? How many upstreams even allow implementing this in the first place? | 08:12:48 |
raitobezarius | safe migrations does not exist | 10:23:45 |
raitobezarius | but unsafe migrations can always be made safe | 10:23:54 |
raitobezarius | depend on how much resources you are ready to commit for it | 10:24:01 |
@piegames:matrix.org | But how | 10:27:24 |
raitobezarius | trivial safe migration algorithm: | 10:33:42 |
raitobezarius | * trivial safe migration algorithm based on unsafe procedure M | 10:33:50 |
raitobezarius | take a snapshot of the service you are migrating S | 10:34:10 |
raitobezarius | do M | 10:34:12 |
raitobezarius | it's safe because you don't lose data, you only availability | 10:34:31 |
raitobezarius | * it's safe because you don't lose data, you only lose availability | 10:34:36 |
raitobezarius | you can go back to version N - 1 by restoring S as long as you don't create new data of course which should be fine if you have a broken service resulting of an unsafe migration | 10:35:01 |
raitobezarius | * take a data snapshot of the service you are migrating S | 10:35:08 |
raitobezarius | you could automate restoring S based on criteria: "the systemd unit really doesn't want to restart", "all requests are 500", etc. | 10:35:30 |
raitobezarius | availability then can be restored via this mechanism | 10:35:51 |
@piegames:matrix.org | Yeah but how do you want to take a snapshot of only a single service? And what about services that interact with others (e.g. a database)? | 10:36:05 |
raitobezarius | ideally, the nixos module should provide the facts about your servicesd | 10:36:27 |
raitobezarius | * ideally, the nixos module should provide the facts about your services | 10:36:28 |
raitobezarius | StateDirectory | 10:36:31 |
raitobezarius | postgresql database | 10:36:36 |
raitobezarius | redis database | 10:36:40 |
raitobezarius | etc. | 10:36:42 |
raitobezarius | then "taking a snapshot" for the service is taking a snapshot of all its constitutents | 10:36:52 |
raitobezarius | and this is an implem detail on how you take such snapshots | 10:36:59 |
@piegames:matrix.org | So you have three services connecting to postgres, and one of them fails during upgrade. That's not an implementation detail, that's a fundamental problem IMO | 10:37:15 |