From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
Subject: Re: UPDATE: numpy
To: Brad Smith <brad@comstyle.com>
Cc: ports@openbsd.org, daniel@openbsd.org
Date: Tue, 24 Jun 2025 17:46:29 +0200

On Sat, Jun 21, 2025 at 04:49:35AM -0400, Brad Smith wrote:
> On 2025-06-20 1:17 p.m., Jeremie Courreges-Anglas wrote:
> > On Fri, Jun 20, 2025 at 01:25:11AM -0400, Brad Smith wrote:
> > > Here is a diff for numpy that backports patches to enable CPU feature
> > > detection via elf_aux_info() on ARM, PowerPC64, and RISC-V 64.
> > PLEASE, Brad, tell us what you tested and what you didn't test when
> > you propose patches, especially arch-dependent ones.
> > This still builds on riscv64.
> > 
> > It shouldn't matter much, but I'd prefer that you initialize 'hwcap'
> > to zero and/or add error checking around elf_aux_info().
> 
> 
> I have build tested and ran 'make test' on both arm64 and riscv64. Nothing
> changes about riscv64.

That's expected, we don't support RVV anyway.

> arm64 -- M1
> 
> before:
> = 97 failed, 44699 passed, 345 skipped, 2815 deselected, 33 xfailed, 4
> xpassed, 65 warnings in 525.39s (0:08:45) =
> 
> after:
> = 97 failed, 44699 passed, 345 skipped, 2815 deselected, 33 xfailed, 4
> xpassed, 65 warnings in 509.20s (0:08:29) =

The apparent speedup wasn't very stable in my tests, but we should
probably go for it anyway.  It could probably be better.  While fixing
riscv64 I also looked at arm64 and I think we're busted because of
some failing checks:

[...]
Message: During parsing cpu-dispatch: The following CPU features were ignored due to platform incompatibility or lack of support:
"XOP FMA4"
Test features "NEON NEON_FP16 NEON_VFPV4 ASIMD" : Supported
Test features "ASIMDHP" : Unsupported due to Compiler fails against the test code of "ASIMDHP"
Test features "ASIMDFHM" : Unsupported due to Implied feature "ASIMDHP" is not supported
Test features "SVE" : Unsupported due to Implied feature "ASIMDHP" is not supported
Configuring npy_cpu_dispatch_config.h using configuration
Message:
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   :
[...]

meson-log.txt:

-----------
Command line: `cc /usr/ports/pobj/py-numpy-2.2.6/numpy-2.2.6/.mesonpy-u2hr_00r/meson-private/tmpgaj2d4u6/testfile.c -o /usr/ports/pobj/py-numpy-2.2.6/numpy-2.2.6/.mesonpy-u2hr_00r/meson-private/tmpgaj2d4u6/output.obj -c -O2 -pipe -g -D_FILE_OFFSET_BITS=64 -O0 -march=armv8.2-a+fp16` -> 0
Running compile:
Working directory:  /tmp/tmp4ekm55s8
Source file: /usr/ports/pobj/py-numpy-2.2.6/numpy-2.2.6/numpy/distutils/checks/cpu_asimdhp.c
-----------
Command line: `cc /usr/ports/pobj/py-numpy-2.2.6/numpy-2.2.6/numpy/distutils/checks/cpu_asimdhp.c -o /tmp/tmp4ekm55s8/output.exe -D_FILE_OFFSET_BITS=64 -march=armv8.2-a+fp16` -> 1
stderr:
/tmp//ccZ65tvS.s: Assembler messages:
/tmp//ccZ65tvS.s:50: Error: selected processor does not support `fabd v0.8h,v1.8h,v0.8h'
/tmp//ccZ65tvS.s:53: Error: selected processor does not support `fcvtzs w0,h0'
/tmp//ccZ65tvS.s:61: Error: selected processor does not support `fabd v0.4h,v1.4h,v0.4h'
/tmp//ccZ65tvS.s:64: Error: selected processor does not support `fcvtzs w0,h0'
-----------

This is because the compiler used really is egcc which in turn uses
devel/gas.

> I'll see about zero initializing hwcap.

thx.  It matters more as a good practice to push upstream than as an
additional safety net in our patch.  elf_aux_info(AT_HWCAP)
*shouldn't* fail on OpenBSD/arm64.

-- 
jca