| 29 Mar 2022 |
toonn | Motivation for Lzip would be the compression ratio, since the bootstrap tarball isn't updated often. (And it seems to work well for Guix.) | 12:31:57 |
atemu12 | toonn: Who decompresses the bootstrap tarball? | 12:33:14 |
Linux Hackerman | Hm, what's the practical difference between lzip and xz? | 12:34:18 |
toonn | Anyone building stdenv. | 12:34:37 |
toonn | Lzip is a much simpler format with a focus on being suitable for archival. Xz has a bit of a complex format that is more vulnerable to corruption. | 12:35:25 |
Linux Hackerman | how can a compression format be more or less vulnerable to corruption? | 12:36:08 |
atemu12 | Linux Hackerman: I think there was a blog post some time ago but I'm not sure how accurate it is. | 12:38:02 |
Linux Hackerman | If anything I'd expect that more complexity (such as checksums) would make it less vulnerable to (undetected) corruption | 12:38:12 |
atemu12 | toonn: I mean what process. If it's the bootstrap tarball, there wouldn't be anything that could decompress it other than Nix itself? | 12:38:35 |
atemu12 | If that's the case, you'd need a compressor Nix can handle for instance | 12:38:56 |
toonn | LinuxHackerman: Bigger headers, unprotected fields in the headers, multiple options for checksumming without any being required by the spec. There's also no utility to help recover an archive. Lzip does checksumming. The author's thoughts on the topic are probably more useful/accurate than my own. | 12:41:37 |
Linux Hackerman | aah ok | 12:41:51 |
toonn | atemu12: IIRC Nix uses libarchive now. That covers pretty much every reasonable compression except Brotli. | 12:42:37 |
@rnhmjoj:maxwell.ydns.eu | this is an article by the author of lzip that goes through the problems with xz: http://web.archive.org/web/20220128214314/https://www.nongnu.org/lzip/xz_inadequate.html | 12:43:01 |
atemu12 | toonn: Good to know, thanks! | 12:43:57 |
Linux Hackerman | I've learnt something new today! Thanks all :) | 12:44:29 |
atemu12 | toonn: In that case, an LZMA compressor like lzip or XZ would be best if you want best compression with slow but not unreasonably slow speeds. Otherwise, zstd. | 12:45:07 |
atemu12 | toonn: Though if it's only needed by people who re-build the stdenv, they'd appreciate the orders of magnitude in speed of zstd more than the few MiB saved by LZMA | 12:47:17 |
toonn | atemu12: The idea is that network bandwidth makes a bigger difference than the decompression speed. | 12:53:19 |
toonn | I think I'll just try both and do some minimal benchmarking. | 12:54:13 |
atemu12 | toonn: At higher LZMA levels, that's actually not necessarily true IIRC. Also, a user hacking on the stdenv likely unpacks the tarball more often than they download it and likely needs to download so many source files that the difference between LZMA and zstd is a drop in the water | 12:56:22 |
toonn | It's also the bandwidth from the project's perspective, of course. | 12:57:36 |
toonn | And the tarball only needs to be unpacked once. Unless the result is garbage collected. But the tarball itself would probably also be GCed, I suppose. | 12:58:18 |
toonn | I don't think a crazy level of either compressor would be appropriate. | 13:00:22 |
toonn | According to the Lzip author Zstd isn't very suitable for archival either, https://lists.gnu.org/archive/html/lzip-bug/2016-10/msg00005.html (The interest in archival is my own, of course, I like the idea of the Nixpkgs cache being append-only forever : ) I don't know whether the corruption-resistance is an advantage at the scale of Darwin users, but maybe?) | 13:05:06 |
Foxboron | In reply to @toonn:matrix.org According to the Lzip author Zstd isn't very suitable for archival either, https://lists.gnu.org/archive/html/lzip-bug/2016-10/msg00005.html (The interest in archival is my own, of course, I like the idea of the Nixpkgs cache being append-only forever : ) I don't know whether the corruption-resistance is an advantage at the scale of Darwin users, but maybe?) Arch does long-term archiving of all packages. xz previously and zstd currently. Not aware of any corruption issues but i don't think anyone has checked either | 13:41:10 |
atemu12 | Also, those integrity concerns don't apply to here since the binary seed will be declared by hash anyways. | 13:43:07 |
tpw_rules | toonn: i thought the bzip choice was motivated by what was available on darwin by default | 13:44:08 |
tpw_rules | i also had always wondered about the lzip author's motivation, the claims seem slightly overblown but eh. i'd probably pick xz because i'm pretty sure it's on darwin too | 13:44:39 |
tpw_rules | but otoh i think xz is non-reproducible in multithreaded mode | 13:48:34 |