From: Steffen Nurpmeso Subject: Re: [NEW] sysutils/s-bsdipa To: ports@openbsd.org Date: Thu, 15 May 2025 21:30:51 +0200 Steffen Nurpmeso wrote in <20250415213303.moqL3xEJ@steffen%sdaoden.eu>: | S-bsdipa, a mutation of BSDiff: | create or apply a ZLIB compressed binary difference patch. | |(Think xdelta3 / vcdiff; but very small and often better.) Any interest in this? Ie it is like a combined FreeBSD bsdiff(1)/bspatch(1), but uses a different storage format. |In short: the OpenBSD port only supports the 32-bit mode, the |binary is a multiplexer that creates and incorporates the patches. |The full test suite is only part of the BsDiPa perl (CPAN) module |(because it would be quite lengthy in C), therefore the port only |creates an on-the-fly test that at least shows that the executable |as such is at least usable. | |In long: ie, that is | | [.] | the BSDiff algorithm of Colin | Percival, [.] taken from the FreeBSD | operating system source code, and slightly rearranged. There is a | freely usable (BSD 2-clause, ISC and MIT licenses) plug-and-play ISO | C99 and perl implementation available (https://github.com/sdaoden/ | s-bsdipa), which includes further references on the algorithm. | [.this port is] | a 32-bit adaption sufficient for email that almost (..and to mention that i think some other such approaches only support 32-bit file sizes anyway; the performance is a bitter thing. Having said this, the real 32-bit limit is about ~500MB, not sufficient for a CD; 64-bit variant is a compile flag.) | halves memory requirements compared to 64-bit, and also produces | smaller difference control data. The resulting binary difference is | then ZLIB[RFC1950] compressed[.] | |with the following adaptions: | | * First of all: the string suffix sorting and difference creation | approach of Colin Percival has been left unchanged. | | * The original had been fixated on 64-bit file sizes and content | representation. The adaption supports (compile-time switching in | between) 32-bit (and 64-bit). Using 32-bit almost halves memory | constraints, and produces smaller patch control data. It is | deemed sufficient for email purposes. (32-bit and 64-bit patches | are not interchangeable.) | | * The "magic window of inspection" has been made configurable, from | the fixed original value 8, which represents a perfect fit for | compiler output. The adaption uses the default value 16, which is | a very good fit for textual data. The value is, however, | irrelevant on the patch application side. | | * In order to reduce memory usage during patch generation, the | adaption uses a shared memory region for differential and extra | data: the former is therefore stored in reversed order, top down. | (This reduces memory usage by the size of the target data set.) | | * The adoption stores data in big endian (network; MSF; most | significant byte first) instead of little endian (LSF; least | significant byte first) byte order. | | * The original uses three separate bzip2 streams to serialize | control, differential and extra data. The adaption separated | patch generation from the I/O layer, which will therefore see the | entire readily prepared patch data.[.] | [The port uses ZLIB[RFC1950] for patch compression.] | | * The original header did not contain the size of the extra data, | which was stored last, with its size implicitly extending to the | end of the patch. The adaption includes the extra data size in | the header, allowing more verification tests to be applied with | only the header being readily parsed. This also enables the I/O | layer to allocate perfectly sized memory with only the header data | being available. | | * The adaption performs memory allocations through user provided | callbacks. --End of <20250415213303.moqL3xEJ@steffen%sdaoden.eu> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)