From: Kirill A. Korinsky Subject: misc/llama.cpp: update to b6934 with required update devel/libggml To: OpenBSD ports Date: Mon, 03 Nov 2025 15:15:52 +0100 ports@, I'd like to update our misc/llama.cpp to the last snapshot (b6934). It allows to run https://huggingface.co/collections/Qwen/qwen3-vl models. We don't have GPU but with -t 32 I had run Qwen3 VL 30B model on CPU only at AMD Ryzen 9 7950X3D with acceptable to use speed like 2 tokens/second which more or leass useble. But it requires memory. 120G as :datasize is enough. Because we uses libggml as dedicated port, it must to be updated to the last version, and it contains a bug which brokes large models under large number of threads: https://github.com/ggml-org/llama.cpp/issues/16960 I had included a patch to fix this issue as well. I had tested analyze of .pdf document, chat and "explain the picture" on -current/amd64. Everything works. Ok? Index: misc/llama.cpp/Makefile =================================================================== RCS file: /home/cvs/ports/misc/llama.cpp/Makefile,v diff -u -p -r1.10 Makefile --- misc/llama.cpp/Makefile 1 Oct 2025 19:44:07 -0000 1.10 +++ misc/llama.cpp/Makefile 3 Nov 2025 13:49:51 -0000 @@ -2,7 +2,7 @@ COMMENT = LLM inference system GH_ACCOUNT = ggml-org GH_PROJECT = llama.cpp -GH_TAGNAME = b6641 +GH_TAGNAME = b6934 PKGNAME = llama.cpp-0.0.${GH_TAGNAME:S/b//} SHARED_LIBS += llama 2.0 Index: misc/llama.cpp/distinfo =================================================================== RCS file: /home/cvs/ports/misc/llama.cpp/distinfo,v diff -u -p -r1.4 distinfo --- misc/llama.cpp/distinfo 1 Oct 2025 19:44:07 -0000 1.4 +++ misc/llama.cpp/distinfo 3 Nov 2025 13:50:07 -0000 @@ -1,2 +1,2 @@ -SHA256 (llama.cpp-b6641.tar.gz) = 0xJrrTblSgapCD4ujzjkobbpyI5bL6fT+4tYchCwwuA= -SIZE (llama.cpp-b6641.tar.gz) = 25867942 +SHA256 (llama.cpp-b6934.tar.gz) = qsr4P+8j/z/nK8k8/Iv1/QTdSFhqIshFYwCI7L5/a7k= +SIZE (llama.cpp-b6934.tar.gz) = 26417348 Index: devel/libggml/Makefile =================================================================== RCS file: /home/cvs/ports/devel/libggml/Makefile,v diff -u -p -r1.2 Makefile --- devel/libggml/Makefile 20 Oct 2025 17:25:51 -0000 1.2 +++ devel/libggml/Makefile 3 Nov 2025 01:44:00 -0000 @@ -2,7 +2,8 @@ COMMENT= tensor library for machine lea GH_ACCOUNT= ggml-org GH_PROJECT= ggml -GH_TAGNAME= v0.9.4 +GH_COMMIT= 09aa758381718f7731c148238574a7e169001f13 +DISTNAME= ggml-0.9.4.20251101 PKGNAME= lib${DISTNAME} SHARED_LIBS += ggml 2.0 Index: devel/libggml/distinfo =================================================================== RCS file: /home/cvs/ports/devel/libggml/distinfo,v diff -u -p -r1.1.1.1 distinfo --- devel/libggml/distinfo 1 Oct 2025 19:42:10 -0000 1.1.1.1 +++ devel/libggml/distinfo 3 Nov 2025 01:44:20 -0000 @@ -1,2 +1,2 @@ -SHA256 (ggml-0.9.4.tar.gz) = JL0VAK7ycUe5LQI8/23P/fyNqcVnUPlB9dXRtTh1xiM= -SIZE (ggml-0.9.4.tar.gz) = 2193279 +SHA256 (ggml-0.9.4.20251101-09aa7583.tar.gz) = fx+ZI4GhV5KlZlGM3QDWERyj1xXY8t5vh5ELtkwK15Y= +SIZE (ggml-0.9.4.20251101-09aa7583.tar.gz) = 2330931 Index: devel/libggml/patches/patch-src_ggml-backend-reg_cpp =================================================================== RCS file: /home/cvs/ports/devel/libggml/patches/patch-src_ggml-backend-reg_cpp,v diff -u -p -r1.1.1.1 patch-src_ggml-backend-reg_cpp --- devel/libggml/patches/patch-src_ggml-backend-reg_cpp 1 Oct 2025 19:42:10 -0000 1.1.1.1 +++ devel/libggml/patches/patch-src_ggml-backend-reg_cpp 3 Nov 2025 12:01:00 -0000 @@ -1,7 +1,7 @@ Index: src/ggml-backend-reg.cpp --- src/ggml-backend-reg.cpp.orig +++ src/ggml-backend-reg.cpp -@@ -517,7 +517,9 @@ static ggml_backend_reg_t ggml_backend_load_best(const +@@ -524,7 +524,9 @@ static ggml_backend_reg_t ggml_backend_load_best(const search_paths.push_back(fs::u8path(GGML_BACKEND_DIR)); #endif // default search paths: executable directory, current directory Index: devel/libggml/patches/patch-src_ggml-cpu_repack_cpp =================================================================== RCS file: devel/libggml/patches/patch-src_ggml-cpu_repack_cpp diff -N devel/libggml/patches/patch-src_ggml-cpu_repack_cpp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ devel/libggml/patches/patch-src_ggml-cpu_repack_cpp 3 Nov 2025 12:01:17 -0000 @@ -0,0 +1,59 @@ +https://github.com/ggml-org/llama.cpp/pull/16956 + +Index: src/ggml-cpu/repack.cpp +--- src/ggml-cpu/repack.cpp.orig ++++ src/ggml-cpu/repack.cpp +@@ -1678,10 +1678,24 @@ template 0 && (nr / nchunk) < min_chunk_size && nr >= min_chunk_size) { ++ nchunk = (nr + min_chunk_size - 1) / min_chunk_size; ++ } ++ + if (nth == 1 || nchunk < nth || disable_chunking) { + nchunk = nth; + } + ++ // Ensure nchunk doesn't exceed the number of rows divided by minimum chunk size ++ // This prevents creating too many tiny chunks that could overlap after alignment ++ const int64_t max_nchunk = (nr + min_chunk_size - 1) / min_chunk_size; ++ if (nchunk > max_nchunk) { ++ nchunk = max_nchunk; ++ } ++ + if (ith == 0) { + // Every thread starts at ith, so the first unprocessed chunk is nth. This save a bit of coordination right at the start. + ggml_threadpool_chunk_set(params->threadpool, nth); +@@ -1695,8 +1709,15 @@ template ne01) { ++ src0_end = ne01; ++ } ++ + if (src0_start >= src0_end) { + break; + } +@@ -1808,8 +1829,12 @@ template ne01) { ++ src0_cur_end = ne01; ++ } + + if (src0_cur_start >= src0_cur_end) { + return; -- wbr, Kirill