Index | Thread | Search

From:
Stuart Henderson <stu@spacehopper.org>
Subject:
new (wip-ish): sysutils/plocate
To:
ports <ports@openbsd.org>
Date:
Fri, 2 Aug 2024 18:14:47 +0100

Download raw body.

Thread
I'm not particularly asking for OKs at the moment, undecided how useful
this is to have in general, but thought I'd send this out since I've got
a port in half decent shape, for interest as much as anything.

It's another locate implementation. Key differences to our usual one:

- rather than excluding "non-public" files from the database, they are
included - the database is mode 640 and the search tool is setgid so
that it can access the files, but it does an access check before
returning results to the user.

- it's extremely fast (it uses an "inverted index" aka "postings
list" of trigrams to allow fast full text searches). That doesn't really
matter for one-off searches (~150ms for a lookup in the default one
isn't too bad), but locate is used in ports infrastructure to check for
duplicate files when generating a plist, and there it soon racks up. If
the plist is much more than "fairly small", it can takes minutes of 100%
cpu on all cores to do that check. So I'm at least slightly interested
in adding a plocate database to pkglocatedb and adding a way to use that
in infrastructure.

If you want to play with this, you can convert a pkglocate db into
plocate format like this:

$ pkglocate : > tmpfile; plocate-build -l no -p tmpfile plocate-pkg.db

(the "-l no" is to mark the database as not requiring access checks;
obviously they're not possible/useful with pkglocatedb as the files are
usually _not_ installed - also the pkgname prefix gets in the way).

Biggest yucky bit with the port if used as a "standard" locate tool is
that the code to check filesystem types is Linux-only and I haven't
added an OpenBSD implementation, so you can't easily disable (e.g.)
"all NFS partitions", you've got to specify paths to skip.