From: Steffen Nurpmeso <steffen@sdaoden.eu>
Subject: [NEW] sysutils/s-bsdipa
To: ports@openbsd.org
Date: Tue, 15 Apr 2025 23:33:03 +0200

Hello.

  S-bsdipa, a mutation of BSDiff:
  create or apply a ZLIB compressed binary difference patch.

(Think xdelta3 / vcdiff; but very small and often better.)

In short: the OpenBSD port only supports the 32-bit mode, the
binary is a multiplexer that creates and incorporates the patches.
The full test suite is only part of the BsDiPa perl (CPAN) module
(because it would be quite lengthy in C), therefore the port only
creates an on-the-fly test that at least shows that the executable
as such is at least usable.

In long: ie, that is

  [.]
   the BSDiff algorithm of Colin
   Percival, [.] taken from the FreeBSD
   operating system source code, and slightly rearranged.  There is a
   freely usable (BSD 2-clause, ISC and MIT licenses) plug-and-play ISO
   C99 and perl implementation available (https://github.com/sdaoden/
   s-bsdipa), which includes further references on the algorithm.
   [.this port is]
   a 32-bit adaption sufficient for email that almost
   halves memory requirements compared to 64-bit, and also produces
   smaller difference control data.  The resulting binary difference is
   then ZLIB[RFC1950] compressed[.]

with the following adaptions:

   *  First of all: the string suffix sorting and difference creation
      approach of Colin Percival has been left unchanged.

   *  The original had been fixated on 64-bit file sizes and content
      representation.  The adaption supports (compile-time switching in
      between) 32-bit (and 64-bit).  Using 32-bit almost halves memory
      constraints, and produces smaller patch control data.  It is
      deemed sufficient for email purposes.  (32-bit and 64-bit patches
      are not interchangeable.)

   *  The "magic window of inspection" has been made configurable, from
      the fixed original value 8, which represents a perfect fit for
      compiler output.  The adaption uses the default value 16, which is
      a very good fit for textual data.  The value is, however,
      irrelevant on the patch application side.

   *  In order to reduce memory usage during patch generation, the
      adaption uses a shared memory region for differential and extra
      data: the former is therefore stored in reversed order, top down.
      (This reduces memory usage by the size of the target data set.)

   *  The adoption stores data in big endian (network; MSF; most
      significant byte first) instead of little endian (LSF; least
      significant byte first) byte order.

   *  The original uses three separate bzip2 streams to serialize
      control, differential and extra data.  The adaption separated
      patch generation from the I/O layer, which will therefore see the
      entire readily prepared patch data.[.]
      [The port uses ZLIB[RFC1950] for patch compression.]

   *  The original header did not contain the size of the extra data,
      which was stored last, with its size implicitly extending to the
      end of the patch.  The adaption includes the extra data size in
      the header, allowing more verification tests to be applied with
      only the header being readily parsed.  This also enables the I/O
      layer to allocate perfectly sized memory with only the header data
      being available.

   *  The adaption performs memory allocations through user provided
      callbacks.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)