| 20 Jun 2023 |
raitobezarius | do M | 10:34:12 |
raitobezarius | it's safe because you don't lose data, you only availability | 10:34:31 |
raitobezarius | * it's safe because you don't lose data, you only lose availability | 10:34:36 |
raitobezarius | you can go back to version N - 1 by restoring S as long as you don't create new data of course which should be fine if you have a broken service resulting of an unsafe migration | 10:35:01 |
raitobezarius | * take a data snapshot of the service you are migrating S | 10:35:08 |
raitobezarius | you could automate restoring S based on criteria: "the systemd unit really doesn't want to restart", "all requests are 500", etc. | 10:35:30 |
raitobezarius | availability then can be restored via this mechanism | 10:35:51 |
@piegames:matrix.org | Yeah but how do you want to take a snapshot of only a single service? And what about services that interact with others (e.g. a database)? | 10:36:05 |
raitobezarius | ideally, the nixos module should provide the facts about your servicesd | 10:36:27 |
raitobezarius | * ideally, the nixos module should provide the facts about your services | 10:36:28 |
raitobezarius | StateDirectory | 10:36:31 |
raitobezarius | postgresql database | 10:36:36 |
raitobezarius | redis database | 10:36:40 |
raitobezarius | etc. | 10:36:42 |
raitobezarius | then "taking a snapshot" for the service is taking a snapshot of all its constitutents | 10:36:52 |
raitobezarius | and this is an implem detail on how you take such snapshots | 10:36:59 |
@piegames:matrix.org | So you have three services connecting to postgres, and one of them fails during upgrade. That's not an implementation detail, that's a fundamental problem IMO | 10:37:15 |
raitobezarius | you can take diff snapshots using the filesystem features if this makes sense, you can take application level snapshots if it's possible | 10:37:18 |
raitobezarius | In reply to @piegames:matrix.org So you have three services connecting to postgres, and one of them fails during upgrade. That's not an implementation detail, that's a fundamental problem IMO taking a PGSQL database snapshot without affecting the other is a requirement | 10:37:45 |
raitobezarius | ok let's define it further | 10:37:52 |
raitobezarius | we have services in nixpkgs which are shared with different applications living on the same machine | 10:38:04 |
raitobezarius | let's say that the rollback/snapshot granularity is at the database level for PGSQL (and similar things) | 10:38:29 |
raitobezarius | 3 services get upgraded, are using 1 database each, one of them fails during upgrade | 10:38:38 |
raitobezarius | you rollback the failed upgrade's database | 10:38:44 |
raitobezarius | not the others one | 10:38:47 |
raitobezarius | except if you go further and you say that this group of 3 services needs to be coupled and then if you look at the systemd dependencies graph, you want to rollback every database | 10:39:19 |
raitobezarius | (network dependencies are hard to discover but I don't think they're impossible to build) | 10:39:50 |
raitobezarius | the way you take snapshots though is impl detail because you could use all kind of techniques which can be better depending on your exact situation | 10:40:14 |
raitobezarius | system snapshot, filesystem-level snapshots, ZFS dataset-level snapshots, application-level snapshots, "backup/restore procedures" | 10:40:42 |
@ronnypfannschmidt:matrix.org | migrations as such need ot be kind of managed as involved service groups that may or may not share required rollbacks
for full safety you need service deployment granularity that enables migrations in steps where both older and newer versons currently avaliable can work with the data in question
maintenance modes for the applications that can happen independ of the code deployment are going to be important as well
so starting points like "having restorable backups" are definitively nice to have but applications themselfes also need some extras | 11:55:24 |