| 4 Dec 2025 |
raitobezarius | so it's an implicit lock on the WAL? | 19:43:27 |
aloisw | It's the checkpointer blocking everything. | 19:49:19 |
aloisw | And the weird thing is that I only got this in the run where I made reads slow as well. | 19:49:50 |
raitobezarius | why is it weird in your opinion? | 19:50:36 |
aloisw | Well either it's a coincidence or the problem is that the checkpointer is blocked on reads. | 19:51:15 |
aloisw | Currently I'm doing another run to figure it out. | 19:51:46 |
raitobezarius | i wonder if the checkpointer blocked on reads makes sense in the transaction isolation model of sqlite | 19:51:44 |
aloisw | Yeah it seems to be blocked on reads | 19:53:05 |
aloisw | Absolutely, the checkpointer needs to read the entire WAL and integrate it into the database. | 19:53:21 |
aloisw | aloisw@exodus ~> ls -lah /mnt/nix/var/nix/db
total 11G
drwxr-xr-x 2 aloisw users 111 Dec 4 20:49 .
drwxr-xr-x 6 aloisw users 79 Dec 4 20:49 ..
-rw------- 1 aloisw users 0 Dec 4 20:49 big-lock
-rw-r--r-- 1 aloisw users 128M Dec 4 20:53 db.sqlite
-rw-r--r-- 1 aloisw users 21M Dec 4 20:53 db.sqlite-shm
-rw-r--r-- 1 aloisw users 11G Dec 4 20:53 db.sqlite-wal
-rw------- 1 aloisw users 8.0M Dec 4 20:49 reserved
-rw-r--r-- 1 aloisw users 2 Dec 4 20:49 schema
Is the problem "the WAL is growing too fast"? | 19:53:52 |
aloisw | aloisw@exodus ~> ls -lah /mnt/nix/var/nix/db
total 11G
drwxr-xr-x 2 aloisw users 111 Dec 4 20:49 .
drwxr-xr-x 6 aloisw users 79 Dec 4 20:49 ..
-rw------- 1 aloisw users 0 Dec 4 20:49 big-lock
-rw-r--r-- 1 aloisw users 128M Dec 4 20:53 db.sqlite
-rw-r--r-- 1 aloisw users 21M Dec 4 20:53 db.sqlite-shm
-rw-r--r-- 1 aloisw users 11G Dec 4 20:53 db.sqlite-wal
-rw------- 1 aloisw users 8.0M Dec 4 20:49 reserved
-rw-r--r-- 1 aloisw users 2 Dec 4 20:49 schema
I wonder if the problem is "the WAL is growing too fast"? | 19:54:05 |
raitobezarius | would that mean checkpoint more frequently would fix that? | 19:55:01 |
raitobezarius | is the checkpoint freq automatically derived? | 19:55:10 |
aloisw | Maybe, it would reduce the latency definitely, but throughput only if the WAL stays in cache then I think. Also it adds more fsync which can slow you down again. | 19:56:19 |
aloisw | Yes, when the WAL grows too big, as determined by the wal_autocheckpoint pragma. | 19:57:26 |
| ellie changed their display name from Ellie (The Fake One) to ellie. | 19:57:32 |
aloisw | Which Lix sets to 40000, so it should be 160 MiB. | 19:59:20 |
aloisw | Hm, but the checkpointer shouldn't block others if I read the docs correctly? | 20:01:12 |
| pentane ⭔ changed their profile picture. | 20:02:32 |
aloisw | It seems that the writers just slow down massively, so possibly this is only indirectly related to the checkpointer falling behind by creating a huge WAL. | 20:09:25 |
Jassuko | Wtf is that WAL size?! :o | 21:06:37 |
raitobezarius | yeah but read perf deteriorates with the size of the WAL | 22:43:19 |
raitobezarius | have you tried a lower value? | 22:43:25 |
raitobezarius | but i guess it's really pesky | 22:44:19 |
raitobezarius | lix by nature in large substitution scenarios is read-write intensive | 22:44:29 |
raitobezarius | * lix by nature in large substitution/builds scenarios is read-write intensive | 22:44:34 |
raitobezarius | but i feel like the fact that lix is blocked by the potential event that the WAL contains a record relevant to it is a mistake given our usage of flock to mark the future happening of a store path | 22:45:19 |
raitobezarius | whereas for writes, it seems it'd be good if we could have multiple WAL so that once one is committed, the other can be still filled? | 22:45:42 |
raitobezarius | maybe we can improve things by initiating checkpoints ourselves at key points… | 22:46:15 |
raitobezarius | it would be interesting to know if we cause checkpoint starvation | 22:47:19 |