Mailing List Archive | ports@openbsd.org

From:: Brad Smith <brad@comstyle.com>
Subject:: UPDATE: dav1d 1.5.0
To:: ports@openbsd.org
Cc:: robert@openbsd.org, kettenis@openbsd.org
Date:: Sun, 1 Dec 2024 03:51:30 -0500
Download raw body.
Thread
2024-12-01 08:51 Brad Smith:
UPDATE: dav1d 1.5.0
- 2024-12-02 12:01 Stuart Henderson:
  UPDATE: dav1d 1.5.0
- - 2024-12-03 12:18 Landry Breuil:
    UPDATE: dav1d 1.5.0
Here is an update to dav1d 1.5.0.

Upstream has created their own diffs to fix aarch64 for xonly and
works fine as is.

https://code.videolan.org/videolan/dav1d/-/commit/41511bf12ef3f7f0facf6e567849b342597bfbd6
https://code.videolan.org/videolan/dav1d/-/commit/2355eeb8f254a1c34dbb0241be5c70cdf6ed46d1

The amd64 patches for IBT need to be reapplied and updated. Could
someone with hardware to test and please be able to look into this?

Upstream seems to be open to accepting IBT patches if they're updated.


Changes for 1.5.0 'Sonic':
--------------------------

1.5.0 is a major release of dav1d, that:
 - WARNING: we removed some of the SSE2 optimizations, so if you care about
            systems without SSSE3, you should be careful when updating!
 - Add Arm OpenBSD run-time CPU feature
 - Optimize index offset calculations for decode_coefs
 - picture: copy HDR10+ and T35 metadata only to visible frames
 - SSSE3 new optimizations for 6-tap (8bit and hbd)
 - AArch64/SVE: Add HBD subpel filters using 128-bit SVE2
 - AArch64: Add USMMLA implempentation for 6-tap H/HV
 - AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters
 - Power9: Optimized ITX till 16x4.
 - Loongarch: numerous optimizations
 - RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx
 - Allow playing videos in full-screen mode in dav1dplay


Changes for 1.4.3 'Road Runner':
--------------------------------

1.4.3 is a small release focused on security issues
 - AArch64: Fix potential out of bounds access in DotProd H/HV filters
 - cli: Prevent buffer over-read


Changes for 1.4.2 'Road Runner':
--------------------------------

1.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC
 - AVX2 optimizations for 8-tap and new variants for 6-tap
 - AVX-512 optimizations for 8-tap and new variants for 6-tap
 - Improve entropy decoding on ARM64
 - New ARM64 optimizations for convolutions based on DotProd extension
 - New ARM64 optimizations for convolutions based on i8mm extension
 - New ARM64 optimizations for subpel and prep filters for i8mm
 - Misc improvements on existing ARM64 optimizations, notably for put/prep
 - New PowerPC9 optimizations for loopfilter
 - Support for macOS kperf API for benchmarking


Changes for 1.4.1 'Road Runner':
--------------------------------

1.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed

- Optimizations for 6tap filters for NEON (ARM)
- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
- Reduction of binary size on ARM64, ARM32 and RISC-V
- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
- Msac optimizations


Changes for 1.4.0 'Road Runner':
--------------------------------

1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations

- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth
- New architecture supported: loongarch
- Loongarch optimizations for 8bit
- New architecture supported: RISC-V
- RISC-V optimizations for itx
- Misc improvements in threading and in reducing binary size
- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580)


Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
------------------------------------------------------

1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.

- Reduce memory usage in numerous places
- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
- new API function to check the API version: dav1d_version_api()
- Rewrite of the SGR functions for ARM64 to be faster
- NEON implemetation of save_tmvs for ARM32 and ARM64
- x86 palette DSP for pal_idx_finish function


Index: Makefile
===================================================================
RCS file: /cvs/ports/multimedia/dav1d/Makefile,v
retrieving revision 1.39
diff -u -p -u -p -r1.39 Makefile
--- Makefile	29 Feb 2024 14:33:39 -0000	1.39
+++ Makefile	1 Dec 2024 08:45:12 -0000
@@ -4,14 +4,13 @@ COMMENT=	small and fast AV1 decoder
 #  /!\ DO NOT UPDATE WITHOUT RUNNING TESTS ON ARM64 (XONLY) and AMD64 (IBT) /!\ #
 #################################################################################
 
-VER=		1.2.1
+VER=		1.5.0
 DISTNAME=	dav1d-${VER}
-REVISION=	3
 CATEGORIES=	multimedia
 SITES=		https://downloads.videolan.org/pub/videolan/dav1d/${VER}/
 EXTRACT_SUFX=	.tar.xz
 
-SHARED_LIBS=	dav1d	2.3
+SHARED_LIBS=	dav1d	3.0
 
 HOMEPAGE=	https://code.videolan.org/videolan/dav1d/
 
@@ -39,6 +38,7 @@ CONFIGURE_ARGS+=-Ddefault_library=both \
 CONFIGURE_ARGS+=-Denable_asm=false
 # XXX SIGBUS otherwise
 CFLAGS+=	-O1
+#CFLAGS+=	-fno-slp-vectorize
 .endif
 
 .include <bsd.port.mk>
Index: distinfo
===================================================================
RCS file: /cvs/ports/multimedia/dav1d/distinfo,v
retrieving revision 1.18
diff -u -p -u -p -r1.18 distinfo
--- distinfo	11 Jun 2023 07:58:45 -0000	1.18
+++ distinfo	1 Dec 2024 08:45:12 -0000
@@ -1,2 +1,2 @@
-SHA256 (dav1d-1.2.1.tar.xz) = TjPrYexUx2ihbaDPj6CSi0xFk/X4BKPIh9SiHDGDQLI=
-SIZE (dav1d-1.2.1.tar.xz) = 873008
+SHA256 (dav1d-1.5.0.tar.xz) = FL1vUVeAjtmu3K++UN9onTBP1IEKwgvm7sGrA3Q2r9Y=
+SIZE (dav1d-1.5.0.tar.xz) = 1017040
Index: patches/patch-src_arm_64_filmgrain16_S
===================================================================
RCS file: patches/patch-src_arm_64_filmgrain16_S
diff -N patches/patch-src_arm_64_filmgrain16_S
--- patches/patch-src_arm_64_filmgrain16_S	24 Apr 2023 21:06:59 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,186 +0,0 @@
-Index: src/arm/64/filmgrain16.S
---- src/arm/64/filmgrain16.S.orig
-+++ src/arm/64/filmgrain16.S
-@@ -740,12 +740,12 @@ function generate_grain_\type\()_16bpc_neon, export=1
-         add             x4,  x1,  #FGD_AR_COEFFS_UV
- .endif
-         add             w9,  w9,  w15 // grain_scale_shift - bitdepth_min_8
--        adr             x16, L(gen_grain_\type\()_tbl)
-+        adrp            x16, L(gen_grain_\type\()_tbl)
-+        add             x16, x16, :lo12: L(gen_grain_\type\()_tbl)
-         ldr             w17, [x1, #FGD_AR_COEFF_LAG]
-         add             w9,  w9,  #4
--        ldrh            w17, [x16, w17, uxtw #1]
-+        ldr             x16, [x16, w17, uxtw #3]
-         dup             v31.8h,  w9    // 4 - bitdepth_min_8 + data->grain_scale_shift
--        sub             x16, x16, w17, uxtw
-         neg             v31.8h,  v31.8h
- 
- .ifc \type, uv_444
-@@ -946,11 +946,13 @@ L(generate_grain_\type\()_lag3):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+        .xword L(generate_grain_\type\()_lag0)
-+        .xword L(generate_grain_\type\()_lag1)
-+        .xword L(generate_grain_\type\()_lag2)
-+        .xword L(generate_grain_\type\()_lag3)
-+	.popsection
- endfunc
- .endm
- 
-@@ -991,12 +993,12 @@ function generate_grain_\type\()_16bpc_neon, export=1
-         ldr             w9,  [x1, #FGD_GRAIN_SCALE_SHIFT]
-         add             x4,  x1,  #FGD_AR_COEFFS_UV
-         add             w9,  w9,  w15 // grain_scale_shift - bitdepth_min_8
--        adr             x16, L(gen_grain_\type\()_tbl)
-+        adrp            x16, L(gen_grain_\type\()_tbl)
-+        add             x16, x16, :lo12: L(gen_grain_\type\()_tbl)
-         ldr             w17, [x1, #FGD_AR_COEFF_LAG]
-         add             w9,  w9,  #4
--        ldrh            w17, [x16, w17, uxtw #1]
-+        ldr             x16, [x16, w17, uxtw #3]
-         dup             v31.8h,  w9    // 4 - bitdepth_min_8 + data->grain_scale_shift
--        sub             x16, x16, w17, uxtw
-         neg             v31.8h,  v31.8h
- 
-         cmp             w13, #0
-@@ -1156,11 +1158,13 @@ L(generate_grain_\type\()_lag3):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+        .xword L(generate_grain_\type\()_lag0)
-+        .xword L(generate_grain_\type\()_lag1)
-+        .xword L(generate_grain_\type\()_lag2)
-+        .xword L(generate_grain_\type\()_lag3)
-+	.popsection
- endfunc
- .endm
- 
-@@ -1306,19 +1310,18 @@ function fgy_32x32_16bpc_neon, export=1
-         add_offset      x5,  w6,  x10, x5,  x9
- 
-         ldr             w11, [sp, #88]         // type
--        adr             x13, L(fgy_loop_tbl)
-+        adrp            x13, L(fgy_loop_tbl)
-+        add             x13, x13, :lo12: L(fgy_loop_tbl)
- 
-         add             x4,  x12, #32*2        // grain_lut += BLOCK_SIZE * bx
-         add             x6,  x14, x9,  lsl #5  // grain_lut += grain_stride * BLOCK_SIZE * by
- 
-         tst             w11, #1
--        ldrh            w11, [x13, w11, uxtw #1]
-+        ldr             x11, [x13, w11, uxtw #3]
- 
-         add             x8,  x16, x9,  lsl #5  // grain_lut += grain_stride * BLOCK_SIZE * by
-         add             x8,  x8,  #32*2        // grain_lut += BLOCK_SIZE * bx
- 
--        sub             x11, x13, w11, uxtw
--
-         b.eq            1f
-         // y overlap
-         dup             v8.8h,   v27.h[0]
-@@ -1481,11 +1484,13 @@ L(loop_\ox\oy):
-         fgy             1, 0
-         fgy             1, 1
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fgy_loop_tbl):
--        .hword L(fgy_loop_tbl) - L(loop_00)
--        .hword L(fgy_loop_tbl) - L(loop_01)
--        .hword L(fgy_loop_tbl) - L(loop_10)
--        .hword L(fgy_loop_tbl) - L(loop_11)
-+        .xword L(loop_00)
-+        .xword L(loop_01)
-+        .xword L(loop_10)
-+        .xword L(loop_11)
-+	.popsection
- endfunc
- 
- // void dav1d_fguv_32x32_420_16bpc_neon(pixel *const dst,
-@@ -1589,11 +1594,12 @@ function fguv_32x32_\layout\()_16bpc_neon, export=1
-         ldr             w13, [sp, #112]        // type
- 
-         movrel          x16, overlap_coeffs_\sx
--        adr             x14, L(fguv_loop_sx\sx\()_tbl)
-+        adrp            x14, L(fguv_loop_sx\sx\()_tbl)
-+        add             x14, x14, :lo12: L(fguv_loop_sx\sx\()_tbl)
- 
-         ld1             {v27.4h, v28.4h}, [x16] // overlap_coeffs
-         tst             w13, #1
--        ldrh            w13, [x14, w13, uxtw #1]
-+        ldr             x13, [x14, w13, uxtw #3]
- 
-         b.eq            1f
-         // y overlap
-@@ -1601,8 +1607,6 @@ function fguv_32x32_\layout\()_16bpc_neon, export=1
-         mov             w9,  #(2 >> \sy)
- 
- 1:
--        sub             x13, x14, w13, uxtw
--
- .if \sy
-         movi            v25.8h,  #23
-         movi            v26.8h,  #22
-@@ -1819,15 +1823,17 @@ L(fguv_loop_sx0_csfl\csfl\()_\ox\oy):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx0_tbl):
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_00)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_01)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_10)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_11)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_00)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_01)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_10)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_11)
-+        .xword L(fguv_loop_sx0_csfl0_00)
-+        .xword L(fguv_loop_sx0_csfl0_01)
-+        .xword L(fguv_loop_sx0_csfl0_10)
-+        .xword L(fguv_loop_sx0_csfl0_11)
-+        .xword L(fguv_loop_sx0_csfl1_00)
-+        .xword L(fguv_loop_sx0_csfl1_01)
-+        .xword L(fguv_loop_sx0_csfl1_10)
-+        .xword L(fguv_loop_sx0_csfl1_11)
-+	.popsection
- endfunc
- 
- function fguv_loop_sx1_neon
-@@ -1985,13 +1991,15 @@ L(fguv_loop_sx1_csfl\csfl\()_\ox\oy):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx1_tbl):
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_00)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_01)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_10)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_11)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_00)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_01)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_10)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_11)
-+        .xword L(fguv_loop_sx1_csfl0_00)
-+        .xword L(fguv_loop_sx1_csfl0_01)
-+        .xword L(fguv_loop_sx1_csfl0_10)
-+        .xword L(fguv_loop_sx1_csfl0_11)
-+        .xword L(fguv_loop_sx1_csfl1_00)
-+        .xword L(fguv_loop_sx1_csfl1_01)
-+        .xword L(fguv_loop_sx1_csfl1_10)
-+        .xword L(fguv_loop_sx1_csfl1_11)
-+	.popsection
- endfunc
Index: patches/patch-src_arm_64_filmgrain_S
===================================================================
RCS file: patches/patch-src_arm_64_filmgrain_S
diff -N patches/patch-src_arm_64_filmgrain_S
--- patches/patch-src_arm_64_filmgrain_S	24 Apr 2023 21:06:59 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,186 +0,0 @@
-Index: src/arm/64/filmgrain.S
---- src/arm/64/filmgrain.S.orig
-+++ src/arm/64/filmgrain.S
-@@ -884,12 +884,12 @@ function generate_grain_\type\()_8bpc_neon, export=1
- .else
-         add             x4,  x1,  #FGD_AR_COEFFS_UV
- .endif
--        adr             x16, L(gen_grain_\type\()_tbl)
-+        adrp            x16, L(gen_grain_\type\()_tbl)
-+        add             x16, x16, :lo12: L(gen_grain_\type\()_tbl)
-         ldr             w17, [x1, #FGD_AR_COEFF_LAG]
-         add             w9,  w9,  #4
--        ldrh            w17, [x16, w17, uxtw #1]
-+        ldr             x16, [x16, w17, uxtw #3]
-         dup             v31.8h,  w9    // 4 + data->grain_scale_shift
--        sub             x16, x16, w17, uxtw
-         neg             v31.8h,  v31.8h
- 
- .ifc \type, uv_444
-@@ -1076,11 +1076,13 @@ L(generate_grain_\type\()_lag3):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+        .xword L(generate_grain_\type\()_lag0)
-+        .xword L(generate_grain_\type\()_lag1)
-+        .xword L(generate_grain_\type\()_lag2)
-+        .xword L(generate_grain_\type\()_lag3)
-+	.popsection
- endfunc
- .endm
- 
-@@ -1118,12 +1120,12 @@ function generate_grain_\type\()_8bpc_neon, export=1
-         ldr             w2,  [x1, #FGD_SEED]
-         ldr             w9,  [x1, #FGD_GRAIN_SCALE_SHIFT]
-         add             x4,  x1,  #FGD_AR_COEFFS_UV
--        adr             x16, L(gen_grain_\type\()_tbl)
-+        adrp            x16, L(gen_grain_\type\()_tbl)
-+        add             x16, x16, :lo12: L(gen_grain_\type\()_tbl)
-         ldr             w17, [x1, #FGD_AR_COEFF_LAG]
-         add             w9,  w9,  #4
--        ldrh            w17, [x16, w17, uxtw #1]
-+        ldr             x16, [x16, w17, uxtw #3]
-         dup             v31.8h,  w9    // 4 + data->grain_scale_shift
--        sub             x16, x16, w17, uxtw
-         neg             v31.8h,  v31.8h
- 
-         cmp             w13, #0
-@@ -1273,11 +1275,13 @@ L(generate_grain_\type\()_lag3):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(gen_grain_\type\()_tbl):
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag0)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag1)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag2)
--        .hword L(gen_grain_\type\()_tbl) - L(generate_grain_\type\()_lag3)
-+        .xword L(generate_grain_\type\()_lag0)
-+        .xword L(generate_grain_\type\()_lag1)
-+        .xword L(generate_grain_\type\()_lag2)
-+        .xword L(generate_grain_\type\()_lag3)
-+	.popsection
- endfunc
- .endm
- 
-@@ -1407,19 +1411,18 @@ function fgy_32x32_8bpc_neon, export=1
-         add_offset      x5,  w6,  x10, x5,  x9
- 
-         ldr             w11, [sp, #24]         // type
--        adr             x13, L(fgy_loop_tbl)
-+        adrp            x13, L(fgy_loop_tbl)
-+        add             x13, x13, :lo12: L(fgy_loop_tbl)
- 
-         add             x4,  x12, #32          // grain_lut += BLOCK_SIZE * bx
-         add             x6,  x14, x9,  lsl #5  // grain_lut += grain_stride * BLOCK_SIZE * by
- 
-         tst             w11, #1
--        ldrh            w11, [x13, w11, uxtw #1]
-+        ldr             x11, [x13, w11, uxtw #3]
- 
-         add             x8,  x16, x9,  lsl #5  // grain_lut += grain_stride * BLOCK_SIZE * by
-         add             x8,  x8,  #32          // grain_lut += BLOCK_SIZE * bx
- 
--        sub             x11, x13, w11, uxtw
--
-         b.eq            1f
-         // y overlap
-         dup             v6.16b,  v27.b[0]
-@@ -1556,11 +1559,13 @@ L(loop_\ox\oy):
-         fgy             1, 0
-         fgy             1, 1
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fgy_loop_tbl):
--        .hword L(fgy_loop_tbl) - L(loop_00)
--        .hword L(fgy_loop_tbl) - L(loop_01)
--        .hword L(fgy_loop_tbl) - L(loop_10)
--        .hword L(fgy_loop_tbl) - L(loop_11)
-+        .xword L(loop_00)
-+        .xword L(loop_01)
-+        .xword L(loop_10)
-+        .xword L(loop_11)
-+	.popsection
- endfunc
- 
- // void dav1d_fguv_32x32_420_8bpc_neon(pixel *const dst,
-@@ -1646,11 +1651,12 @@ function fguv_32x32_\layout\()_8bpc_neon, export=1
-         ldr             w13, [sp, #64]         // type
- 
-         movrel          x16, overlap_coeffs_\sx
--        adr             x14, L(fguv_loop_sx\sx\()_tbl)
-+        adrp            x14, L(fguv_loop_sx\sx\()_tbl)
-+        add             x14, x14, :lo12: L(fguv_loop_sx\sx\()_tbl)
- 
-         ld1             {v27.8b, v28.8b}, [x16] // overlap_coeffs
-         tst             w13, #1
--        ldrh            w13, [x14, w13, uxtw #1]
-+        ldr             x13, [x14, w13, uxtw #3]
- 
-         b.eq            1f
-         // y overlap
-@@ -1658,8 +1664,6 @@ function fguv_32x32_\layout\()_8bpc_neon, export=1
-         mov             w9,  #(2 >> \sy)
- 
- 1:
--        sub             x13, x14, w13, uxtw
--
- .if \sy
-         movi            v25.16b, #23
-         movi            v26.16b, #22
-@@ -1849,15 +1853,17 @@ L(fguv_loop_sx0_csfl\csfl\()_\ox\oy):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx0_tbl):
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_00)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_01)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_10)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl0_11)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_00)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_01)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_10)
--        .hword L(fguv_loop_sx0_tbl) - L(fguv_loop_sx0_csfl1_11)
-+        .xword L(fguv_loop_sx0_csfl0_00)
-+        .xword L(fguv_loop_sx0_csfl0_01)
-+        .xword L(fguv_loop_sx0_csfl0_10)
-+        .xword L(fguv_loop_sx0_csfl0_11)
-+        .xword L(fguv_loop_sx0_csfl1_00)
-+        .xword L(fguv_loop_sx0_csfl1_01)
-+        .xword L(fguv_loop_sx0_csfl1_10)
-+        .xword L(fguv_loop_sx0_csfl1_11)
-+	.popsection
- endfunc
- 
- function fguv_loop_sx1_neon
-@@ -1998,13 +2004,15 @@ L(fguv_loop_sx1_csfl\csfl\()_\ox\oy):
-         AARCH64_VALIDATE_LINK_REGISTER
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(fguv_loop_sx1_tbl):
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_00)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_01)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_10)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl0_11)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_00)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_01)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_10)
--        .hword L(fguv_loop_sx1_tbl) - L(fguv_loop_sx1_csfl1_11)
-+        .xword L(fguv_loop_sx1_csfl0_00)
-+        .xword L(fguv_loop_sx1_csfl0_01)
-+        .xword L(fguv_loop_sx1_csfl0_10)
-+        .xword L(fguv_loop_sx1_csfl0_11)
-+        .xword L(fguv_loop_sx1_csfl1_00)
-+        .xword L(fguv_loop_sx1_csfl1_01)
-+        .xword L(fguv_loop_sx1_csfl1_10)
-+        .xword L(fguv_loop_sx1_csfl1_11)
-+	.popsection
- endfunc
Index: patches/patch-src_arm_64_ipred16_S
===================================================================
RCS file: patches/patch-src_arm_64_ipred16_S
diff -N patches/patch-src_arm_64_ipred16_S
--- patches/patch-src_arm_64_ipred16_S	13 Jul 2023 12:26:14 -0000	1.4
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,965 +0,0 @@
-Index: src/arm/64/ipred16.S
---- src/arm/64/ipred16.S.orig
-+++ src/arm/64/ipred16.S
-@@ -36,11 +36,11 @@
- function ipred_dc_128_16bpc_neon, export=1
-         ldr             w8,  [sp]
-         clz             w3,  w3
--        adr             x5,  L(ipred_dc_128_tbl)
-+        adrp            x5,  L(ipred_dc_128_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_128_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         dup             v0.8h,   w8
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         urshr           v0.8h,   v0.8h,  #1
-@@ -106,12 +106,14 @@ function ipred_dc_128_16bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_128_tbl):
--        .hword L(ipred_dc_128_tbl) - 640b
--        .hword L(ipred_dc_128_tbl) - 320b
--        .hword L(ipred_dc_128_tbl) - 160b
--        .hword L(ipred_dc_128_tbl) -   8b
--        .hword L(ipred_dc_128_tbl) -   4b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword   8b
-+        .xword   4b
-+	.popsection
- endfunc
- 
- // void ipred_v_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -120,11 +122,11 @@ endfunc
- //                         const int max_width, const int max_height);
- function ipred_v_16bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_v_tbl)
-+        adrp            x5,  L(ipred_v_tbl)
-+        add             x5,  x5, :lo12: L(ipred_v_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         add             x2,  x2,  #2
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -190,12 +192,14 @@ function ipred_v_16bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_v_tbl):
--        .hword L(ipred_v_tbl) - 640b
--        .hword L(ipred_v_tbl) - 320b
--        .hword L(ipred_v_tbl) - 160b
--        .hword L(ipred_v_tbl) -  80b
--        .hword L(ipred_v_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_h_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -204,11 +208,11 @@ endfunc
- //                         const int max_width, const int max_height);
- function ipred_h_16bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_h_tbl)
-+        adrp            x5,  L(ipred_h_tbl)
-+        add             x5,  x5, :lo12: L(ipred_h_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         sub             x2,  x2,  #8
--        sub             x5,  x5,  w3, uxtw
-         mov             x7,  #-8
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-@@ -292,12 +296,14 @@ function ipred_h_16bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_h_tbl):
--        .hword L(ipred_h_tbl) - 64b
--        .hword L(ipred_h_tbl) - 32b
--        .hword L(ipred_h_tbl) - 16b
--        .hword L(ipred_h_tbl) -  8b
--        .hword L(ipred_h_tbl) -  4b
-+        .xword 64b
-+        .xword 32b
-+        .xword 16b
-+        .xword  8b
-+        .xword  4b
-+	.popsection
- endfunc
- 
- // void ipred_dc_top_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -306,11 +312,11 @@ endfunc
- //                              const int max_width, const int max_height);
- function ipred_dc_top_16bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_dc_top_tbl)
-+        adrp            x5,  L(ipred_dc_top_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_top_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         add             x2,  x2,  #2
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -409,12 +415,14 @@ function ipred_dc_top_16bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_top_tbl):
--        .hword L(ipred_dc_top_tbl) - 640b
--        .hword L(ipred_dc_top_tbl) - 320b
--        .hword L(ipred_dc_top_tbl) - 160b
--        .hword L(ipred_dc_top_tbl) -  80b
--        .hword L(ipred_dc_top_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_dc_left_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -425,13 +433,12 @@ function ipred_dc_left_16bpc_neon, export=1
-         sub             x2,  x2,  w4, uxtw #1
-         clz             w3,  w3
-         clz             w7,  w4
--        adr             x5,  L(ipred_dc_left_tbl)
-+        adrp            x5,  L(ipred_dc_left_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_left_tbl)
-         sub             w3,  w3,  #20 // 25 leading bits, minus table offset 5
-         sub             w7,  w7,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
--        ldrh            w7,  [x5, w7, uxtw #1]
--        sub             x3,  x5,  w3, uxtw
--        sub             x5,  x5,  w7, uxtw
-+        ldr             x3,  [x5, w3, uxtw #3]
-+        ldr             x5,  [x5, w7, uxtw #3]
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -550,17 +557,19 @@ L(ipred_dc_left_w64):
-         b.gt            1b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_left_tbl):
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h64)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h32)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h16)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h8)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h4)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w64)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w32)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w16)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w8)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w4)
-+        .xword L(ipred_dc_left_h64)
-+        .xword L(ipred_dc_left_h32)
-+        .xword L(ipred_dc_left_h16)
-+        .xword L(ipred_dc_left_h8)
-+        .xword L(ipred_dc_left_h4)
-+        .xword L(ipred_dc_left_w64)
-+        .xword L(ipred_dc_left_w32)
-+        .xword L(ipred_dc_left_w16)
-+        .xword L(ipred_dc_left_w8)
-+        .xword L(ipred_dc_left_w4)
-+	.popsection
- endfunc
- 
- // void ipred_dc_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -573,16 +582,15 @@ function ipred_dc_16bpc_neon, export=1
-         clz             w3,  w3
-         clz             w6,  w4
-         dup             v16.4s, w7               // width + height
--        adr             x5,  L(ipred_dc_tbl)
-+        adrp            x5,  L(ipred_dc_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_tbl)
-         rbit            w7,  w7                  // rbit(width + height)
-         sub             w3,  w3,  #20            // 25 leading bits, minus table offset 5
-         sub             w6,  w6,  #25
-         clz             w7,  w7                  // ctz(width + height)
--        ldrh            w3,  [x5, w3, uxtw #1]
--        ldrh            w6,  [x5, w6, uxtw #1]
-+        ldr             x3,  [x5, w3, uxtw #3]
-+        ldr             x5,  [x5, w6, uxtw #3]
-         neg             w7,  w7                  // -ctz(width + height)
--        sub             x3,  x5,  w3, uxtw
--        sub             x5,  x5,  w6, uxtw
-         ushr            v16.4s,  v16.4s,  #1     // (width + height) >> 1
-         dup             v17.4s,  w7              // -ctz(width + height)
-         add             x6,  x0,  x1
-@@ -795,17 +803,19 @@ L(ipred_dc_w64):
-         b.gt            2b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_tbl):
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h64)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h32)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h16)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h8)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h4)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w64)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w32)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w16)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w8)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w4)
-+        .xword L(ipred_dc_h64)
-+        .xword L(ipred_dc_h32)
-+        .xword L(ipred_dc_h16)
-+        .xword L(ipred_dc_h8)
-+        .xword L(ipred_dc_h4)
-+        .xword L(ipred_dc_w64)
-+        .xword L(ipred_dc_w32)
-+        .xword L(ipred_dc_w16)
-+        .xword L(ipred_dc_w8)
-+        .xword L(ipred_dc_w4)
-+	.popsection
- endfunc
- 
- // void ipred_paeth_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -814,13 +824,13 @@ endfunc
- //                             const int max_width, const int max_height);
- function ipred_paeth_16bpc_neon, export=1
-         clz             w9,  w3
--        adr             x5,  L(ipred_paeth_tbl)
-+        adrp            x5,  L(ipred_paeth_tbl)
-+        add             x5,  x5, :lo12: L(ipred_paeth_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.8h},  [x2]
-         add             x8,  x2,  #2
-         sub             x2,  x2,  #8
--        sub             x5,  x5,  w9, uxtw
-         mov             x7,  #-8
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-@@ -934,12 +944,14 @@ function ipred_paeth_16bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_paeth_tbl):
--        .hword L(ipred_paeth_tbl) - 640b
--        .hword L(ipred_paeth_tbl) - 320b
--        .hword L(ipred_paeth_tbl) - 160b
--        .hword L(ipred_paeth_tbl) -  80b
--        .hword L(ipred_paeth_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -951,13 +963,13 @@ function ipred_smooth_16bpc_neon, export=1
-         add             x11, x10, w4, uxtw
-         add             x10, x10, w3, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_tbl)
-+        adrp            x5,  L(ipred_smooth_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_tbl)
-         sub             x12, x2,  w4, uxtw #1
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.8h},  [x12] // bottom
-         add             x8,  x2,  #2
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1138,12 +1150,14 @@ function ipred_smooth_16bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_tbl):
--        .hword L(ipred_smooth_tbl) - 640b
--        .hword L(ipred_smooth_tbl) - 320b
--        .hword L(ipred_smooth_tbl) - 160b
--        .hword L(ipred_smooth_tbl) -  80b
--        .hword L(ipred_smooth_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_v_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1154,13 +1168,13 @@ function ipred_smooth_v_16bpc_neon, export=1
-         movrel          x7,  X(sm_weights)
-         add             x7,  x7,  w4, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_v_tbl)
-+        adrp            x5,  L(ipred_smooth_v_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_v_tbl)
-         sub             x8,  x2,  w4, uxtw #1
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.8h},  [x8] // bottom
-         add             x2,  x2,  #2
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1265,12 +1279,14 @@ function ipred_smooth_v_16bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_v_tbl):
--        .hword L(ipred_smooth_v_tbl) - 640b
--        .hword L(ipred_smooth_v_tbl) - 320b
--        .hword L(ipred_smooth_v_tbl) - 160b
--        .hword L(ipred_smooth_v_tbl) -  80b
--        .hword L(ipred_smooth_v_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_h_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1281,12 +1297,12 @@ function ipred_smooth_h_16bpc_neon, export=1
-         movrel          x8,  X(sm_weights)
-         add             x8,  x8,  w3, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_h_tbl)
-+        adrp            x5,  L(ipred_smooth_h_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_h_tbl)
-         add             x12, x2,  w3, uxtw #1
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v5.8h},  [x12] // right
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1397,12 +1413,14 @@ function ipred_smooth_h_16bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_h_tbl):
--        .hword L(ipred_smooth_h_tbl) - 640b
--        .hword L(ipred_smooth_h_tbl) - 320b
--        .hword L(ipred_smooth_h_tbl) - 160b
--        .hword L(ipred_smooth_h_tbl) -  80b
--        .hword L(ipred_smooth_h_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- const padding_mask_buf
-@@ -1728,11 +1746,11 @@ endfunc
- //                                const int dx, const int max_base_x);
- function ipred_z1_fill1_16bpc_neon, export=1
-         clz             w9,  w3
--        adr             x8,  L(ipred_z1_fill1_tbl)
-+        adrp            x8,  L(ipred_z1_fill1_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z1_fill1_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
-+        ldr             x8,  [x8, w9, uxtw #3]
-         add             x10, x2,  w6,  uxtw #1    // top[max_base_x]
--        sub             x8,  x8,  w9,  uxtw
-         ld1r            {v31.8h}, [x10]           // padding
-         mov             w7,  w5
-         mov             w15, #64
-@@ -1917,12 +1935,14 @@ function ipred_z1_fill1_16bpc_neon, export=1
-         mov             w3,  w12
-         b               169b
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z1_fill1_tbl):
--        .hword L(ipred_z1_fill1_tbl) - 640b
--        .hword L(ipred_z1_fill1_tbl) - 320b
--        .hword L(ipred_z1_fill1_tbl) - 160b
--        .hword L(ipred_z1_fill1_tbl) -  80b
--        .hword L(ipred_z1_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function ipred_z1_fill2_16bpc_neon, export=1
-@@ -2050,11 +2070,11 @@ endconst
- //                                const int dx, const int dy);
- function ipred_z2_fill1_16bpc_neon, export=1
-         clz             w10, w4
--        adr             x9,  L(ipred_z2_fill1_tbl)
-+        adrp            x9,  L(ipred_z2_fill1_tbl)
-+        add             x9,  x9, :lo12: L(ipred_z2_fill1_tbl)
-         sub             w10, w10, #25
--        ldrh            w10, [x9, w10, uxtw #1]
-+        ldr             x9, [x9, w10, uxtw #3]
-         mov             w8,  #(1 << 6)            // xpos = 1 << 6
--        sub             x9,  x9,  w10, uxtw
-         sub             w8,  w8,  w6              // xpos -= dx
- 
-         movrel          x11, increments
-@@ -2815,12 +2835,14 @@ function ipred_z2_fill1_16bpc_neon, export=1
-         ldp             d8,  d9,  [sp], 0x40
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z2_fill1_tbl):
--        .hword L(ipred_z2_fill1_tbl) - 640b
--        .hword L(ipred_z2_fill1_tbl) - 320b
--        .hword L(ipred_z2_fill1_tbl) - 160b
--        .hword L(ipred_z2_fill1_tbl) -  80b
--        .hword L(ipred_z2_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function ipred_z2_fill2_16bpc_neon, export=1
-@@ -3432,11 +3454,11 @@ endfunc
- //                                const int dy, const int max_base_y);
- function ipred_z3_fill1_16bpc_neon, export=1
-         clz             w9,  w4
--        adr             x8,  L(ipred_z3_fill1_tbl)
-+        adrp            x8,  L(ipred_z3_fill1_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z3_fill1_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
-+        ldr             x8,  [x8, w9, uxtw #3]
-         add             x10, x2,  w6,  uxtw #1    // left[max_base_y]
--        sub             x8,  x8,  w9,  uxtw
-         ld1r            {v31.8h}, [x10]           // padding
-         mov             w7,  w5
-         mov             w15, #64
-@@ -3638,17 +3660,20 @@ function ipred_z3_fill1_16bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill1_tbl):
--        .hword L(ipred_z3_fill1_tbl) - 640b
--        .hword L(ipred_z3_fill1_tbl) - 320b
--        .hword L(ipred_z3_fill1_tbl) - 160b
--        .hword L(ipred_z3_fill1_tbl) -  80b
--        .hword L(ipred_z3_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function ipred_z3_fill_padding_neon, export=0
-         cmp             w3,  #8
--        adr             x8,  L(ipred_z3_fill_padding_tbl)
-+        adrp            x8,  L(ipred_z3_fill_padding_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z3_fill_padding_tbl)
-         b.gt            L(ipred_z3_fill_padding_wide)
-         // w3 = remaining width, w4 = constant height
-         mov             w12, w4
-@@ -3659,10 +3684,11 @@ function ipred_z3_fill_padding_neon, export=0
-         // power of two in the remaining width, and repeating.
-         clz             w9,  w3
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
--        sub             x9,  x8,  w9,  uxtw
-+        ldr             x9,  [x8, w9, uxtw #3]
-         br              x9
- 
-+20:
-+        AARCH64_VALID_JUMP_TARGET
- 2:
-         st1             {v31.s}[0], [x0],  x1
-         subs            w4,  w4,  #4
-@@ -3681,6 +3707,8 @@ function ipred_z3_fill_padding_neon, export=0
-         mov             w4,  w12
-         b               1b
- 
-+40:
-+        AARCH64_VALID_JUMP_TARGET
- 4:
-         st1             {v31.4h}, [x0],  x1
-         subs            w4,  w4,  #4
-@@ -3699,10 +3727,11 @@ function ipred_z3_fill_padding_neon, export=0
-         mov             w4,  w12
-         b               1b
- 
--8:
--16:
--32:
--64:
-+80:
-+160:
-+320:
-+640:
-+        AARCH64_VALID_JUMP_TARGET
-         st1             {v31.8h}, [x0],  x1
-         subs            w4,  w4,  #4
-         st1             {v31.8h}, [x13], x1
-@@ -3723,13 +3752,15 @@ function ipred_z3_fill_padding_neon, export=0
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill_padding_tbl):
--        .hword L(ipred_z3_fill_padding_tbl) - 64b
--        .hword L(ipred_z3_fill_padding_tbl) - 32b
--        .hword L(ipred_z3_fill_padding_tbl) - 16b
--        .hword L(ipred_z3_fill_padding_tbl) -  8b
--        .hword L(ipred_z3_fill_padding_tbl) -  4b
--        .hword L(ipred_z3_fill_padding_tbl) -  2b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+        .xword  20b
-+	.popsection
- 
- L(ipred_z3_fill_padding_wide):
-         // Fill a WxH rectangle with padding, with W > 8.
-@@ -3880,13 +3911,13 @@ function ipred_filter_\bpc\()bpc_neon
-         add             x6,  x6,  w5, uxtw
-         ld1             {v16.8b, v17.8b, v18.8b, v19.8b}, [x6], #32
-         clz             w9,  w3
--        adr             x5,  L(ipred_filter\bpc\()_tbl)
-+        adrp            x5,  L(ipred_filter\bpc\()_tbl)
-+        add             x5,  x5, :lo12: L(ipred_filter\bpc\()_tbl)
-         ld1             {v20.8b, v21.8b, v22.8b}, [x6]
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         sxtl            v16.8h,  v16.8b
-         sxtl            v17.8h,  v17.8b
--        sub             x5,  x5,  w9, uxtw
-         sxtl            v18.8h,  v18.8b
-         sxtl            v19.8h,  v19.8b
-         add             x6,  x0,  x1
-@@ -4160,11 +4191,13 @@ function ipred_filter_\bpc\()bpc_neon
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_filter\bpc\()_tbl):
--        .hword L(ipred_filter\bpc\()_tbl) - 320b
--        .hword L(ipred_filter\bpc\()_tbl) - 160b
--        .hword L(ipred_filter\bpc\()_tbl) -  80b
--        .hword L(ipred_filter\bpc\()_tbl) -  40b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- .endm
- 
-@@ -4184,11 +4217,11 @@ endfunc
- function pal_pred_16bpc_neon, export=1
-         ld1             {v30.8h}, [x2]
-         clz             w9,  w4
--        adr             x6,  L(pal_pred_tbl)
-+        adrp            x6,  L(pal_pred_tbl)
-+        add             x6,  x6, :lo12: L(pal_pred_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x6, w9, uxtw #1]
-+        ldr             x6,  [x6, w9, uxtw #3]
-         movi            v31.8h,  #1, lsl #8
--        sub             x6,  x6,  w9, uxtw
-         br              x6
- 40:
-         AARCH64_VALID_JUMP_TARGET
-@@ -4357,12 +4390,14 @@ function pal_pred_16bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(pal_pred_tbl):
--        .hword L(pal_pred_tbl) - 640b
--        .hword L(pal_pred_tbl) - 320b
--        .hword L(pal_pred_tbl) - 160b
--        .hword L(pal_pred_tbl) -  80b
--        .hword L(pal_pred_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_cfl_128_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4373,12 +4408,12 @@ endfunc
- function ipred_cfl_128_16bpc_neon, export=1
-         dup             v31.8h,  w7   // bitdepth_max
-         clz             w9,  w3
--        adr             x7,  L(ipred_cfl_128_tbl)
-+        adrp            x7,  L(ipred_cfl_128_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_128_tbl)
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x7, w9, uxtw #1]
-+        ldr             x7,  [x7, w9, uxtw #3]
-         urshr           v0.8h,   v31.8h,  #1
-         dup             v1.8h,   w6   // alpha
--        sub             x7,  x7,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         movi            v30.8h,  #0
-@@ -4510,12 +4545,14 @@ L(ipred_cfl_splat_w16):
-         b.gt            1b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_128_tbl):
- L(ipred_cfl_splat_tbl):
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w8)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w4)
-+        .xword L(ipred_cfl_splat_w16)
-+        .xword L(ipred_cfl_splat_w16)
-+        .xword L(ipred_cfl_splat_w8)
-+        .xword L(ipred_cfl_splat_w4)
-+	.popsection
- endfunc
- 
- // void ipred_cfl_top_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4526,12 +4563,12 @@ endfunc
- function ipred_cfl_top_16bpc_neon, export=1
-         dup             v31.8h,  w7   // bitdepth_max
-         clz             w9,  w3
--        adr             x7,  L(ipred_cfl_top_tbl)
-+        adrp            x7,  L(ipred_cfl_top_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_top_tbl)
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x7, w9, uxtw #1]
-+        ldr             x7,  [x7, w9, uxtw #3]
-         dup             v1.8h,   w6   // alpha
-         add             x2,  x2,  #2
--        sub             x7,  x7,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         movi            v30.8h,  #0
-@@ -4569,11 +4606,13 @@ function ipred_cfl_top_16bpc_neon, export=1
-         dup             v0.8h,   v0.h[0]
-         b               L(ipred_cfl_splat_w16)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_top_tbl):
--        .hword L(ipred_cfl_top_tbl) - 32b
--        .hword L(ipred_cfl_top_tbl) - 16b
--        .hword L(ipred_cfl_top_tbl) -  8b
--        .hword L(ipred_cfl_top_tbl) -  4b
-+        .xword 32b
-+        .xword 16b
-+        .xword  8b
-+        .xword  4b
-+	.popsection
- endfunc
- 
- // void ipred_cfl_left_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4586,15 +4625,15 @@ function ipred_cfl_left_16bpc_neon, export=1
-         sub             x2,  x2,  w4, uxtw #1
-         clz             w9,  w3
-         clz             w8,  w4
--        adr             x10, L(ipred_cfl_splat_tbl)
--        adr             x7,  L(ipred_cfl_left_tbl)
-+        adrp            x10, L(ipred_cfl_splat_tbl)
-+        add             x10, x10, :lo12: L(ipred_cfl_splat_tbl)
-+        adrp            x7,  L(ipred_cfl_left_tbl)
-+        add             x7, x7, :lo12: L(ipred_cfl_left_tbl)
-         sub             w9,  w9,  #26
-         sub             w8,  w8,  #26
--        ldrh            w9,  [x10, w9, uxtw #1]
--        ldrh            w8,  [x7,  w8, uxtw #1]
-+        ldr             x9,  [x10, w9, uxtw #3]
-+        ldr             x7,  [x7,  w8, uxtw #3]
-         dup             v1.8h,   w6   // alpha
--        sub             x9,  x10, w9, uxtw
--        sub             x7,  x7,  w8, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         movi            v30.8h,  #0
-@@ -4636,11 +4675,13 @@ L(ipred_cfl_left_h32):
-         dup             v0.8h,   v0.h[0]
-         br              x9
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_left_tbl):
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h32)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h16)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h8)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h4)
-+        .xword L(ipred_cfl_left_h32)
-+        .xword L(ipred_cfl_left_h16)
-+        .xword L(ipred_cfl_left_h8)
-+        .xword L(ipred_cfl_left_h4)
-+	.popsection
- endfunc
- 
- // void ipred_cfl_16bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4656,16 +4697,15 @@ function ipred_cfl_16bpc_neon, export=1
-         clz             w9,  w3
-         clz             w6,  w4
-         dup             v16.4s, w8               // width + height
--        adr             x7,  L(ipred_cfl_tbl)
-+        adrp            x7,  L(ipred_cfl_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_tbl)
-         rbit            w8,  w8                  // rbit(width + height)
-         sub             w9,  w9,  #22            // 26 leading bits, minus table offset 4
-         sub             w6,  w6,  #26
-         clz             w8,  w8                  // ctz(width + height)
--        ldrh            w9,  [x7, w9, uxtw #1]
--        ldrh            w6,  [x7, w6, uxtw #1]
-+        ldr             x9,  [x7, w9, uxtw #3]
-+        ldr             x7,  [x7, w6, uxtw #3]
-         neg             w8,  w8                  // -ctz(width + height)
--        sub             x9,  x7,  w9, uxtw
--        sub             x7,  x7,  w6, uxtw
-         ushr            v16.4s,  v16.4s,  #1     // (width + height) >> 1
-         dup             v17.4s,  w8              // -ctz(width + height)
-         add             x6,  x0,  x1
-@@ -4789,15 +4829,17 @@ L(ipred_cfl_w32):
-         dup             v0.8h,   v0.h[0]
-         b               L(ipred_cfl_splat_w16)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_tbl):
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h32)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h16)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h8)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h4)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w32)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w16)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w8)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w4)
-+        .xword L(ipred_cfl_h32)
-+        .xword L(ipred_cfl_h16)
-+        .xword L(ipred_cfl_h8)
-+        .xword L(ipred_cfl_h4)
-+        .xword L(ipred_cfl_w32)
-+        .xword L(ipred_cfl_w16)
-+        .xword L(ipred_cfl_w8)
-+        .xword L(ipred_cfl_w4)
-+	.popsection
- endfunc
- 
- // void cfl_ac_420_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4806,14 +4848,14 @@ endfunc
- function ipred_cfl_ac_420_16bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_420_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_420_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_420_tbl)
-         sub             w8,  w8,  #27
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v24.4s,  #0
-         movi            v25.4s,  #0
-         movi            v26.4s,  #0
-         movi            v27.4s,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -4945,9 +4987,9 @@ L(ipred_cfl_ac_420_w8_hpad):
- 
- L(ipred_cfl_ac_420_w16):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_420_w16_tbl)
--        ldrh            w3,  [x7, w3, uxtw #1]
--        sub             x7,  x7,  w3, uxtw
-+        adrp            x7,  L(ipred_cfl_ac_420_w16_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_420_w16_tbl)
-+        ldr             x7,  [x7, w3, uxtw #3]
-         br              x7
- 
- L(ipred_cfl_ac_420_w16_wpad0):
-@@ -5124,17 +5166,19 @@ L(ipred_cfl_ac_420_w16_hpad):
-         lsl             w6,  w6,  #2
-         b               L(ipred_cfl_ac_420_w4_calc_subtract_dc)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_420_tbl):
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w16)
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w8)
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w4)
--        .hword 0
-+        .xword L(ipred_cfl_ac_420_w16)
-+        .xword L(ipred_cfl_ac_420_w8)
-+        .xword L(ipred_cfl_ac_420_w4)
-+        .xword 0
- 
- L(ipred_cfl_ac_420_w16_tbl):
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad0)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad1)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad2)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad3)
-+        .xword L(ipred_cfl_ac_420_w16_wpad0)
-+        .xword L(ipred_cfl_ac_420_w16_wpad1)
-+        .xword L(ipred_cfl_ac_420_w16_wpad2)
-+        .xword L(ipred_cfl_ac_420_w16_wpad3)
-+	.popsection
- endfunc
- 
- // void cfl_ac_422_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -5143,14 +5187,14 @@ endfunc
- function ipred_cfl_ac_422_16bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_422_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_422_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_422_tbl)
-         sub             w8,  w8,  #27
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v24.4s,  #0
-         movi            v25.4s,  #0
-         movi            v26.4s,  #0
-         movi            v27.4s,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -5251,9 +5295,9 @@ L(ipred_cfl_ac_422_w8_wpad):
- 
- L(ipred_cfl_ac_422_w16):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_422_w16_tbl)
--        ldrh            w3,  [x7, w3, uxtw #1]
--        sub             x7,  x7,  w3, uxtw
-+        adrp            x7,  L(ipred_cfl_ac_422_w16_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_422_w16_tbl)
-+        ldr             x7,  [x7, w3, uxtw #3]
-         br              x7
- 
- L(ipred_cfl_ac_422_w16_wpad0):
-@@ -5372,17 +5416,19 @@ L(ipred_cfl_ac_422_w16_wpad3):
-         mov             v1.16b,  v3.16b
-         b               L(ipred_cfl_ac_420_w16_hpad)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_422_tbl):
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w16)
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w8)
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w4)
--        .hword 0
-+        .xword L(ipred_cfl_ac_422_w16)
-+        .xword L(ipred_cfl_ac_422_w8)
-+        .xword L(ipred_cfl_ac_422_w4)
-+        .xword 0
- 
- L(ipred_cfl_ac_422_w16_tbl):
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad0)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad1)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad2)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad3)
-+        .xword L(ipred_cfl_ac_422_w16_wpad0)
-+        .xword L(ipred_cfl_ac_422_w16_wpad1)
-+        .xword L(ipred_cfl_ac_422_w16_wpad2)
-+        .xword L(ipred_cfl_ac_422_w16_wpad3)
-+	.popsection
- endfunc
- 
- // void cfl_ac_444_16bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -5391,14 +5437,14 @@ endfunc
- function ipred_cfl_ac_444_16bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_444_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_444_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_444_tbl)
-         sub             w8,  w8,  #26
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v24.4s,  #0
-         movi            v25.4s,  #0
-         movi            v26.4s,  #0
-         movi            v27.4s,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -5507,10 +5553,11 @@ L(ipred_cfl_ac_444_w16_wpad):
- 
- L(ipred_cfl_ac_444_w32):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_444_w32_tbl)
--        ldrh            w3,  [x7, w3, uxtw] // (w3>>1) << 1
-+        adrp            x7,  L(ipred_cfl_ac_444_w32_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_444_w32_tbl)
-+        lsr             w3,  w3, #1
-+        ldr             x7,  [x7, w3, uxtw #3] // (w3>>1) << 3
-         lsr             x2,  x2,  #1 // Restore the stride to one line increments
--        sub             x7,  x7,  w3, uxtw
-         br              x7
- 
- L(ipred_cfl_ac_444_w32_wpad0):
-@@ -5625,15 +5672,17 @@ L(ipred_cfl_ac_444_w32_hpad):
-         lsl             w6,  w6,  #3
-         b               L(ipred_cfl_ac_420_w4_calc_subtract_dc)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_444_tbl):
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w32)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w16)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w8)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w4)
-+        .xword L(ipred_cfl_ac_444_w32)
-+        .xword L(ipred_cfl_ac_444_w16)
-+        .xword L(ipred_cfl_ac_444_w8)
-+        .xword L(ipred_cfl_ac_444_w4)
- 
- L(ipred_cfl_ac_444_w32_tbl):
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad0)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad2)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad4)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad6)
-+        .xword L(ipred_cfl_ac_444_w32_wpad0)
-+        .xword L(ipred_cfl_ac_444_w32_wpad2)
-+        .xword L(ipred_cfl_ac_444_w32_wpad4)
-+        .xword L(ipred_cfl_ac_444_w32_wpad6)
-+	.popsection
- endfunc
Index: patches/patch-src_arm_64_ipred_S
===================================================================
RCS file: patches/patch-src_arm_64_ipred_S
diff -N patches/patch-src_arm_64_ipred_S
--- patches/patch-src_arm_64_ipred_S	13 Jul 2023 12:26:14 -0000	1.4
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,972 +0,0 @@
-Index: src/arm/64/ipred.S
---- src/arm/64/ipred.S.orig
-+++ src/arm/64/ipred.S
-@@ -34,11 +34,11 @@
- //                             const int max_width, const int max_height);
- function ipred_dc_128_8bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_dc_128_tbl)
-+        adrp            x5,  L(ipred_dc_128_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_128_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         movi            v0.16b,  #128
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -94,12 +94,14 @@ function ipred_dc_128_8bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_128_tbl):
--        .hword L(ipred_dc_128_tbl) - 640b
--        .hword L(ipred_dc_128_tbl) - 320b
--        .hword L(ipred_dc_128_tbl) -  16b
--        .hword L(ipred_dc_128_tbl) -   8b
--        .hword L(ipred_dc_128_tbl) -   4b
-+        .xword 640b
-+        .xword 320b
-+        .xword  16b
-+        .xword   8b
-+        .xword   4b
-+	.popsection
- endfunc
- 
- // void ipred_v_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -108,11 +110,11 @@ endfunc
- //                        const int max_width, const int max_height);
- function ipred_v_8bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_v_tbl)
-+        adrp            x5,  L(ipred_v_tbl)
-+        add             x5,  x5, :lo12: L(ipred_v_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         add             x2,  x2,  #1
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -172,12 +174,14 @@ function ipred_v_8bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_v_tbl):
--        .hword L(ipred_v_tbl) - 640b
--        .hword L(ipred_v_tbl) - 320b
--        .hword L(ipred_v_tbl) - 160b
--        .hword L(ipred_v_tbl) -  80b
--        .hword L(ipred_v_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_h_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -186,11 +190,11 @@ endfunc
- //                        const int max_width, const int max_height);
- function ipred_h_8bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_h_tbl)
-+        adrp            x5,  L(ipred_h_tbl)
-+        add             x5,  x5, :lo12: L(ipred_h_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         sub             x2,  x2,  #4
--        sub             x5,  x5,  w3, uxtw
-         mov             x7,  #-4
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-@@ -258,12 +262,14 @@ function ipred_h_8bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_h_tbl):
--        .hword L(ipred_h_tbl) - 64b
--        .hword L(ipred_h_tbl) - 32b
--        .hword L(ipred_h_tbl) - 16b
--        .hword L(ipred_h_tbl) -  8b
--        .hword L(ipred_h_tbl) -  4b
-+        .xword 64b
-+        .xword 32b
-+        .xword 16b
-+        .xword  8b
-+        .xword  4b
-+	.popsection
- endfunc
- 
- // void ipred_dc_top_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -272,11 +278,11 @@ endfunc
- //                             const int max_width, const int max_height);
- function ipred_dc_top_8bpc_neon, export=1
-         clz             w3,  w3
--        adr             x5,  L(ipred_dc_top_tbl)
-+        adrp            x5,  L(ipred_dc_top_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_top_tbl)
-         sub             w3,  w3,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x5,  [x5, w3, uxtw #3]
-         add             x2,  x2,  #1
--        sub             x5,  x5,  w3, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -363,12 +369,14 @@ function ipred_dc_top_8bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_top_tbl):
--        .hword L(ipred_dc_top_tbl) - 640b
--        .hword L(ipred_dc_top_tbl) - 320b
--        .hword L(ipred_dc_top_tbl) - 160b
--        .hword L(ipred_dc_top_tbl) -  80b
--        .hword L(ipred_dc_top_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 80b
-+        .xword 40b
-+	.popsection
- endfunc
- 
- // void ipred_dc_left_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -379,13 +387,12 @@ function ipred_dc_left_8bpc_neon, export=1
-         sub             x2,  x2,  w4, uxtw
-         clz             w3,  w3
-         clz             w7,  w4
--        adr             x5,  L(ipred_dc_left_tbl)
-+        adrp            x5,  L(ipred_dc_left_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_left_tbl)
-         sub             w3,  w3,  #20 // 25 leading bits, minus table offset 5
-         sub             w7,  w7,  #25
--        ldrh            w3,  [x5, w3, uxtw #1]
--        ldrh            w7,  [x5, w7, uxtw #1]
--        sub             x3,  x5,  w3, uxtw
--        sub             x5,  x5,  w7, uxtw
-+        ldr             x3,  [x5, w3, uxtw #3]
-+        ldr             x5,  [x5, w7, uxtw #3]
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -489,17 +496,19 @@ L(ipred_dc_left_w64):
-         b.gt            1b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_left_tbl):
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h64)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h32)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h16)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h8)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_h4)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w64)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w32)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w16)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w8)
--        .hword L(ipred_dc_left_tbl) - L(ipred_dc_left_w4)
-+        .xword L(ipred_dc_left_h64)
-+        .xword L(ipred_dc_left_h32)
-+        .xword L(ipred_dc_left_h16)
-+        .xword L(ipred_dc_left_h8)
-+        .xword L(ipred_dc_left_h4)
-+        .xword L(ipred_dc_left_w64)
-+        .xword L(ipred_dc_left_w32)
-+        .xword L(ipred_dc_left_w16)
-+        .xword L(ipred_dc_left_w8)
-+        .xword L(ipred_dc_left_w4)
-+	.popsection
- endfunc
- 
- // void ipred_dc_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -512,16 +521,15 @@ function ipred_dc_8bpc_neon, export=1
-         clz             w3,  w3
-         clz             w6,  w4
-         dup             v16.8h, w7               // width + height
--        adr             x5,  L(ipred_dc_tbl)
-+        adrp            x5,  L(ipred_dc_tbl)
-+        add             x5,  x5, :lo12: L(ipred_dc_tbl)
-         rbit            w7,  w7                  // rbit(width + height)
-         sub             w3,  w3,  #20            // 25 leading bits, minus table offset 5
-         sub             w6,  w6,  #25
-         clz             w7,  w7                  // ctz(width + height)
--        ldrh            w3,  [x5, w3, uxtw #1]
--        ldrh            w6,  [x5, w6, uxtw #1]
-+        ldr             x3,  [x5, w3, uxtw #3]
-+        ldr             x5,  [x5, w6, uxtw #3]
-         neg             w7,  w7                  // -ctz(width + height)
--        sub             x3,  x5,  w3, uxtw
--        sub             x5,  x5,  w6, uxtw
-         ushr            v16.8h,  v16.8h,  #1     // (width + height) >> 1
-         dup             v17.8h,  w7              // -ctz(width + height)
-         add             x6,  x0,  x1
-@@ -714,17 +722,19 @@ L(ipred_dc_w64):
-         b.gt            2b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_dc_tbl):
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h64)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h32)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h16)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h8)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_h4)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w64)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w32)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w16)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w8)
--        .hword L(ipred_dc_tbl) - L(ipred_dc_w4)
-+        .xword L(ipred_dc_h64)
-+        .xword L(ipred_dc_h32)
-+        .xword L(ipred_dc_h16)
-+        .xword L(ipred_dc_h8)
-+        .xword L(ipred_dc_h4)
-+        .xword L(ipred_dc_w64)
-+        .xword L(ipred_dc_w32)
-+        .xword L(ipred_dc_w16)
-+        .xword L(ipred_dc_w8)
-+        .xword L(ipred_dc_w4)
-+	.popsection
- endfunc
- 
- // void ipred_paeth_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -733,13 +743,13 @@ endfunc
- //                            const int max_width, const int max_height);
- function ipred_paeth_8bpc_neon, export=1
-         clz             w9,  w3
--        adr             x5,  L(ipred_paeth_tbl)
-+        adrp            x5,  L(ipred_paeth_tbl)
-+        add             x5,  x5, :lo12: L(ipred_paeth_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.16b},  [x2]
-         add             x8,  x2,  #1
-         sub             x2,  x2,  #4
--        sub             x5,  x5,  w9, uxtw
-         mov             x7,  #-4
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-@@ -899,12 +909,14 @@ function ipred_paeth_8bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_paeth_tbl):
--        .hword L(ipred_paeth_tbl) - 640b
--        .hword L(ipred_paeth_tbl) - 320b
--        .hword L(ipred_paeth_tbl) - 160b
--        .hword L(ipred_paeth_tbl) -  80b
--        .hword L(ipred_paeth_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -916,13 +928,13 @@ function ipred_smooth_8bpc_neon, export=1
-         add             x11, x10, w4, uxtw
-         add             x10, x10, w3, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_tbl)
-+        adrp            x5,  L(ipred_smooth_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_tbl)
-         sub             x12, x2,  w4, uxtw
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.16b},  [x12] // bottom
-         add             x8,  x2,  #1
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1080,12 +1092,14 @@ function ipred_smooth_8bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_tbl):
--        .hword L(ipred_smooth_tbl) - 640b
--        .hword L(ipred_smooth_tbl) - 320b
--        .hword L(ipred_smooth_tbl) - 160b
--        .hword L(ipred_smooth_tbl) -  80b
--        .hword L(ipred_smooth_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_v_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1096,13 +1110,13 @@ function ipred_smooth_v_8bpc_neon, export=1
-         movrel          x7,  X(sm_weights)
-         add             x7,  x7,  w4, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_v_tbl)
-+        adrp            x5,  L(ipred_smooth_v_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_v_tbl)
-         sub             x8,  x2,  w4, uxtw
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v4.16b},  [x8] // bottom
-         add             x2,  x2,  #1
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1221,12 +1235,14 @@ function ipred_smooth_v_8bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_v_tbl):
--        .hword L(ipred_smooth_v_tbl) - 640b
--        .hword L(ipred_smooth_v_tbl) - 320b
--        .hword L(ipred_smooth_v_tbl) - 160b
--        .hword L(ipred_smooth_v_tbl) -  80b
--        .hword L(ipred_smooth_v_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 80b
-+        .xword 40b
-+	.popsection
- endfunc
- 
- // void ipred_smooth_h_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -1237,12 +1253,12 @@ function ipred_smooth_h_8bpc_neon, export=1
-         movrel          x8,  X(sm_weights)
-         add             x8,  x8,  w3, uxtw
-         clz             w9,  w3
--        adr             x5,  L(ipred_smooth_h_tbl)
-+        adrp            x5,  L(ipred_smooth_h_tbl)
-+        add             x5,  x5, :lo12: L(ipred_smooth_h_tbl)
-         add             x12, x2,  w3, uxtw
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         ld1r            {v5.16b},  [x12] // right
--        sub             x5,  x5,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x5
-@@ -1367,12 +1383,14 @@ function ipred_smooth_h_8bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_smooth_h_tbl):
--        .hword L(ipred_smooth_h_tbl) - 640b
--        .hword L(ipred_smooth_h_tbl) - 320b
--        .hword L(ipred_smooth_h_tbl) - 160b
--        .hword L(ipred_smooth_h_tbl) -  80b
--        .hword L(ipred_smooth_h_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 80b
-+        .xword 40b
-+	.popsection
- endfunc
- 
- const padding_mask_buf
-@@ -1653,11 +1671,11 @@ endfunc
- //                               const int dx, const int max_base_x);
- function ipred_z1_fill1_8bpc_neon, export=1
-         clz             w9,  w3
--        adr             x8,  L(ipred_z1_fill1_tbl)
-+        adrp            x8,  L(ipred_z1_fill1_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z1_fill1_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
-+        ldr             x8,  [x8, w9, uxtw #3]
-         add             x10, x2,  w6,  uxtw       // top[max_base_x]
--        sub             x8,  x8,  w9,  uxtw
-         ld1r            {v31.16b}, [x10]          // padding
-         mov             w7,  w5
-         mov             w15, #64
-@@ -1816,12 +1834,14 @@ function ipred_z1_fill1_8bpc_neon, export=1
-         mov             w3,  w12
-         b               169b
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z1_fill1_tbl):
--        .hword L(ipred_z1_fill1_tbl) - 640b
--        .hword L(ipred_z1_fill1_tbl) - 320b
--        .hword L(ipred_z1_fill1_tbl) - 160b
--        .hword L(ipred_z1_fill1_tbl) -  80b
--        .hword L(ipred_z1_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function ipred_z1_fill2_8bpc_neon, export=1
-@@ -1940,11 +1960,11 @@ endconst
- //                               const int dx, const int dy);
- function ipred_z2_fill1_8bpc_neon, export=1
-         clz             w10, w4
--        adr             x9,  L(ipred_z2_fill1_tbl)
-+        adrp            x9,  L(ipred_z2_fill1_tbl)
-+        add             x9,  x9, :lo12: L(ipred_z2_fill1_tbl)
-         sub             w10, w10, #25
--        ldrh            w10, [x9, w10, uxtw #1]
-+        ldr             x9, [x9, w10, uxtw #3]
-         mov             w8,  #(1 << 6)            // xpos = 1 << 6
--        sub             x9,  x9,  w10, uxtw
-         sub             w8,  w8,  w6              // xpos -= dx
- 
-         movrel          x11, increments
-@@ -2651,12 +2671,14 @@ function ipred_z2_fill1_8bpc_neon, export=1
-         ldp             d8,  d9,  [sp], 0x40
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z2_fill1_tbl):
--        .hword L(ipred_z2_fill1_tbl) - 640b
--        .hword L(ipred_z2_fill1_tbl) - 320b
--        .hword L(ipred_z2_fill1_tbl) - 160b
--        .hword L(ipred_z2_fill1_tbl) -  80b
--        .hword L(ipred_z2_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function ipred_z2_fill2_8bpc_neon, export=1
-@@ -3160,11 +3182,11 @@ endfunc
- function ipred_z3_fill1_8bpc_neon, export=1
-         cmp             w6,  #64
-         clz             w9,  w3
--        adr             x8,  L(ipred_z3_fill1_tbl)
-+        adrp            x8,  L(ipred_z3_fill1_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z3_fill1_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
-+        ldr             x8,  [x8, w9, uxtw #3]
-         add             x10, x2,  w6,  uxtw       // left[max_base_y]
--        sub             x8,  x8,  w9,  uxtw
-         movrel          x11, increments
-         ld1r            {v31.16b}, [x10]          // padding
-         ld1             {v30.8h},  [x11]          // increments
-@@ -3503,17 +3525,20 @@ L(ipred_z3_fill1_large_h16):
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill1_tbl):
--        .hword L(ipred_z3_fill1_tbl) - 640b
--        .hword L(ipred_z3_fill1_tbl) - 320b
--        .hword L(ipred_z3_fill1_tbl) - 160b
--        .hword L(ipred_z3_fill1_tbl) -  80b
--        .hword L(ipred_z3_fill1_tbl) -  40b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 80b
-+        .xword 40b
-+	.popsection
- endfunc
- 
- function ipred_z3_fill_padding_neon, export=0
-         cmp             w3,  #16
--        adr             x8,  L(ipred_z3_fill_padding_tbl)
-+        adrp            x8,  L(ipred_z3_fill_padding_tbl)
-+        add             x8,  x8, :lo12: L(ipred_z3_fill_padding_tbl)
-         b.gt            L(ipred_z3_fill_padding_wide)
-         // w3 = remaining width, w4 = constant height
-         mov             w12, w4
-@@ -3524,10 +3549,11 @@ function ipred_z3_fill_padding_neon, export=0
-         // power of two in the remaining width, and repeating.
-         clz             w9,  w3
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x8, w9, uxtw #1]
--        sub             x9,  x8,  w9,  uxtw
-+        ldr             x9,  [x8, w9, uxtw #3]
-         br              x9
- 
-+20:
-+        AARCH64_VALID_JUMP_TARGET
- 2:
-         st1             {v31.h}[0], [x0],  x1
-         subs            w4,  w4,  #4
-@@ -3546,6 +3572,8 @@ function ipred_z3_fill_padding_neon, export=0
-         mov             w4,  w12
-         b               1b
- 
-+40:
-+        AARCH64_VALID_JUMP_TARGET
- 4:
-         st1             {v31.s}[0], [x0],  x1
-         subs            w4,  w4,  #4
-@@ -3564,7 +3592,8 @@ function ipred_z3_fill_padding_neon, export=0
-         mov             w4,  w12
-         b               1b
- 
--8:
-+80:
-+        AARCH64_VALID_JUMP_TARGET
-         st1             {v31.8b}, [x0],  x1
-         subs            w4,  w4,  #4
-         st1             {v31.8b}, [x13], x1
-@@ -3582,9 +3611,10 @@ function ipred_z3_fill_padding_neon, export=0
-         mov             w4,  w12
-         b               1b
- 
--16:
--32:
--64:
-+160:
-+320:
-+640:
-+        AARCH64_VALID_JUMP_TARGET
-         st1             {v31.16b}, [x0],  x1
-         subs            w4,  w4,  #4
-         st1             {v31.16b}, [x13], x1
-@@ -3605,13 +3635,15 @@ function ipred_z3_fill_padding_neon, export=0
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_z3_fill_padding_tbl):
--        .hword L(ipred_z3_fill_padding_tbl) - 64b
--        .hword L(ipred_z3_fill_padding_tbl) - 32b
--        .hword L(ipred_z3_fill_padding_tbl) - 16b
--        .hword L(ipred_z3_fill_padding_tbl) -  8b
--        .hword L(ipred_z3_fill_padding_tbl) -  4b
--        .hword L(ipred_z3_fill_padding_tbl) -  2b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+        .xword  20b
-+	.popsection
- 
- L(ipred_z3_fill_padding_wide):
-         // Fill a WxH rectangle with padding, with W > 16.
-@@ -3766,13 +3798,13 @@ function ipred_filter_8bpc_neon, export=1
-         add             x6,  x6,  w5, uxtw
-         ld1             {v16.8b, v17.8b, v18.8b, v19.8b}, [x6], #32
-         clz             w9,  w3
--        adr             x5,  L(ipred_filter_tbl)
-+        adrp            x5,  L(ipred_filter_tbl)
-+        add             x5,  x5, :lo12: L(ipred_filter_tbl)
-         ld1             {v20.8b, v21.8b, v22.8b}, [x6]
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x5, w9, uxtw #1]
-+        ldr             x5,  [x5, w9, uxtw #3]
-         sxtl            v16.8h,  v16.8b
-         sxtl            v17.8h,  v17.8b
--        sub             x5,  x5,  w9, uxtw
-         sxtl            v18.8h,  v18.8b
-         sxtl            v19.8h,  v19.8b
-         add             x6,  x0,  x1
-@@ -3913,11 +3945,13 @@ function ipred_filter_8bpc_neon, export=1
- 9:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_filter_tbl):
--        .hword L(ipred_filter_tbl) - 320b
--        .hword L(ipred_filter_tbl) - 160b
--        .hword L(ipred_filter_tbl) -  80b
--        .hword L(ipred_filter_tbl) -  40b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- // void pal_pred_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -3926,11 +3960,11 @@ endfunc
- function pal_pred_8bpc_neon, export=1
-         ld1             {v0.8h}, [x2]
-         clz             w9,  w4
--        adr             x6,  L(pal_pred_tbl)
-+        adrp            x6,  L(pal_pred_tbl)
-+        add             x6,  x6, :lo12: L(pal_pred_tbl)
-         sub             w9,  w9,  #25
--        ldrh            w9,  [x6, w9, uxtw #1]
-+        ldr             x6,  [x6, w9, uxtw #3]
-         xtn             v0.8b,  v0.8h
--        sub             x6,  x6,  w9, uxtw
-         add             x2,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x6
-@@ -4008,12 +4042,14 @@ function pal_pred_8bpc_neon, export=1
-         b.gt            64b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(pal_pred_tbl):
--        .hword L(pal_pred_tbl) - 64b
--        .hword L(pal_pred_tbl) - 32b
--        .hword L(pal_pred_tbl) - 16b
--        .hword L(pal_pred_tbl) -  8b
--        .hword L(pal_pred_tbl) -  4b
-+        .xword 64b
-+        .xword 32b
-+        .xword 16b
-+        .xword 8b
-+        .xword 4b
-+	.popsection
- endfunc
- 
- // void ipred_cfl_128_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4022,12 +4058,12 @@ endfunc
- //                              const int16_t *ac, const int alpha);
- function ipred_cfl_128_8bpc_neon, export=1
-         clz             w9,  w3
--        adr             x7,  L(ipred_cfl_128_tbl)
-+        adrp            x7,  L(ipred_cfl_128_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_128_tbl)
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x7, w9, uxtw #1]
-+        ldr             x7,  [x7, w9, uxtw #3]
-         movi            v0.8h,   #128 // dc
-         dup             v1.8h,   w6   // alpha
--        sub             x7,  x7,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x7
-@@ -4132,12 +4168,14 @@ L(ipred_cfl_splat_w16):
-         b.gt            1b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_128_tbl):
- L(ipred_cfl_splat_tbl):
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w16)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w8)
--        .hword L(ipred_cfl_128_tbl) - L(ipred_cfl_splat_w4)
-+        .xword L(ipred_cfl_splat_w16)
-+        .xword L(ipred_cfl_splat_w16)
-+        .xword L(ipred_cfl_splat_w8)
-+        .xword L(ipred_cfl_splat_w4)
-+	.popsection
- endfunc
- 
- // void ipred_cfl_top_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4146,12 +4184,12 @@ endfunc
- //                              const int16_t *ac, const int alpha);
- function ipred_cfl_top_8bpc_neon, export=1
-         clz             w9,  w3
--        adr             x7,  L(ipred_cfl_top_tbl)
-+        adrp            x7,  L(ipred_cfl_top_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_top_tbl)
-         sub             w9,  w9,  #26
--        ldrh            w9,  [x7, w9, uxtw #1]
-+        ldr             x7,  [x7, w9, uxtw #3]
-         dup             v1.8h,   w6   // alpha
-         add             x2,  x2,  #1
--        sub             x7,  x7,  w9, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x7
-@@ -4186,11 +4224,13 @@ function ipred_cfl_top_8bpc_neon, export=1
-         dup             v0.8h,   v2.h[0]
-         b               L(ipred_cfl_splat_w16)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_top_tbl):
--        .hword L(ipred_cfl_top_tbl) - 32b
--        .hword L(ipred_cfl_top_tbl) - 16b
--        .hword L(ipred_cfl_top_tbl) -  8b
--        .hword L(ipred_cfl_top_tbl) -  4b
-+        .xword 32b
-+        .xword 16b
-+        .xword 8b
-+        .xword 4b
-+	.popsection
- endfunc
- 
- // void ipred_cfl_left_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4201,15 +4241,15 @@ function ipred_cfl_left_8bpc_neon, export=1
-         sub             x2,  x2,  w4, uxtw
-         clz             w9,  w3
-         clz             w8,  w4
--        adr             x10, L(ipred_cfl_splat_tbl)
--        adr             x7,  L(ipred_cfl_left_tbl)
-+        adrp            x10, L(ipred_cfl_splat_tbl)
-+        add             x10, x10, :lo12: L(ipred_cfl_splat_tbl)
-+        adrp            x7,  L(ipred_cfl_left_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_left_tbl)
-         sub             w9,  w9,  #26
-         sub             w8,  w8,  #26
--        ldrh            w9,  [x10, w9, uxtw #1]
--        ldrh            w8,  [x7,  w8, uxtw #1]
-+        ldr             x9,  [x10, w9, uxtw #3]
-+        ldr             x7,  [x7,  w8, uxtw #3]
-         dup             v1.8h,   w6   // alpha
--        sub             x9,  x10, w9, uxtw
--        sub             x7,  x7,  w8, uxtw
-         add             x6,  x0,  x1
-         lsl             x1,  x1,  #1
-         br              x7
-@@ -4248,11 +4288,13 @@ L(ipred_cfl_left_h32):
-         dup             v0.8h,   v2.h[0]
-         br              x9
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_left_tbl):
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h32)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h16)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h8)
--        .hword L(ipred_cfl_left_tbl) - L(ipred_cfl_left_h4)
-+        .xword L(ipred_cfl_left_h32)
-+        .xword L(ipred_cfl_left_h16)
-+        .xword L(ipred_cfl_left_h8)
-+        .xword L(ipred_cfl_left_h4)
-+	.popsection
- endfunc
- 
- // void ipred_cfl_8bpc_neon(pixel *dst, const ptrdiff_t stride,
-@@ -4266,16 +4308,15 @@ function ipred_cfl_8bpc_neon, export=1
-         clz             w9,  w3
-         clz             w6,  w4
-         dup             v16.8h, w8               // width + height
--        adr             x7,  L(ipred_cfl_tbl)
-+        adrp            x7,  L(ipred_cfl_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_tbl)
-         rbit            w8,  w8                  // rbit(width + height)
-         sub             w9,  w9,  #22            // 26 leading bits, minus table offset 4
-         sub             w6,  w6,  #26
-         clz             w8,  w8                  // ctz(width + height)
--        ldrh            w9,  [x7, w9, uxtw #1]
--        ldrh            w6,  [x7, w6, uxtw #1]
-+        ldr             x9,  [x7, w9, uxtw #3]
-+        ldr             x7,  [x7, w6, uxtw #3]
-         neg             w8,  w8                  // -ctz(width + height)
--        sub             x9,  x7,  w9, uxtw
--        sub             x7,  x7,  w6, uxtw
-         ushr            v16.8h,  v16.8h,  #1     // (width + height) >> 1
-         dup             v17.8h,  w8              // -ctz(width + height)
-         add             x6,  x0,  x1
-@@ -4392,15 +4433,17 @@ L(ipred_cfl_w32):
-         dup             v0.8h,   v0.h[0]
-         b               L(ipred_cfl_splat_w16)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_tbl):
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h32)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h16)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h8)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_h4)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w32)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w16)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w8)
--        .hword L(ipred_cfl_tbl) - L(ipred_cfl_w4)
-+        .xword L(ipred_cfl_h32)
-+        .xword L(ipred_cfl_h16)
-+        .xword L(ipred_cfl_h8)
-+        .xword L(ipred_cfl_h4)
-+        .xword L(ipred_cfl_w32)
-+        .xword L(ipred_cfl_w16)
-+        .xword L(ipred_cfl_w8)
-+        .xword L(ipred_cfl_w4)
-+	.popsection
- endfunc
- 
- // void cfl_ac_420_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4409,14 +4452,14 @@ endfunc
- function ipred_cfl_ac_420_8bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_420_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_420_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_420_tbl)
-         sub             w8,  w8,  #27
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v16.8h,  #0
-         movi            v17.8h,  #0
-         movi            v18.8h,  #0
-         movi            v19.8h,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -4555,9 +4598,9 @@ L(ipred_cfl_ac_420_w8_subtract_dc):
- 
- L(ipred_cfl_ac_420_w16):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_420_w16_tbl)
--        ldrh            w3,  [x7, w3, uxtw #1]
--        sub             x7,  x7,  w3, uxtw
-+        adrp            x7,  L(ipred_cfl_ac_420_w16_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_420_w16_tbl)
-+        ldr             x7,  [x7, w3, uxtw #3]
-         br              x7
- 
- L(ipred_cfl_ac_420_w16_wpad0):
-@@ -4714,17 +4757,19 @@ L(ipred_cfl_ac_420_w16_hpad):
-         lsl             w6,  w6,  #1
-         b               L(ipred_cfl_ac_420_w8_calc_subtract_dc)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_420_tbl):
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w16)
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w8)
--        .hword L(ipred_cfl_ac_420_tbl) - L(ipred_cfl_ac_420_w4)
--        .hword 0
-+        .xword L(ipred_cfl_ac_420_w16)
-+        .xword L(ipred_cfl_ac_420_w8)
-+        .xword L(ipred_cfl_ac_420_w4)
-+        .xword 0
- 
- L(ipred_cfl_ac_420_w16_tbl):
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad0)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad1)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad2)
--        .hword L(ipred_cfl_ac_420_w16_tbl) - L(ipred_cfl_ac_420_w16_wpad3)
-+        .xword L(ipred_cfl_ac_420_w16_wpad0)
-+        .xword L(ipred_cfl_ac_420_w16_wpad1)
-+        .xword L(ipred_cfl_ac_420_w16_wpad2)
-+        .xword L(ipred_cfl_ac_420_w16_wpad3)
-+	.popsection
- endfunc
- 
- // void cfl_ac_422_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4733,14 +4778,14 @@ endfunc
- function ipred_cfl_ac_422_8bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_422_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_422_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_422_tbl)
-         sub             w8,  w8,  #27
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v16.8h,  #0
-         movi            v17.8h,  #0
-         movi            v18.8h,  #0
-         movi            v19.8h,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -4831,9 +4876,9 @@ L(ipred_cfl_ac_422_w8_wpad):
- 
- L(ipred_cfl_ac_422_w16):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_422_w16_tbl)
--        ldrh            w3,  [x7, w3, uxtw #1]
--        sub             x7,  x7,  w3, uxtw
-+        adrp            x7,  L(ipred_cfl_ac_422_w16_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_422_w16_tbl)
-+        ldr             x7,  [x7, w3, uxtw #3]
-         br              x7
- 
- L(ipred_cfl_ac_422_w16_wpad0):
-@@ -4936,17 +4981,19 @@ L(ipred_cfl_ac_422_w16_wpad3):
-         mov             v1.16b,  v3.16b
-         b               L(ipred_cfl_ac_420_w16_hpad)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_422_tbl):
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w16)
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w8)
--        .hword L(ipred_cfl_ac_422_tbl) - L(ipred_cfl_ac_422_w4)
--        .hword 0
-+        .xword L(ipred_cfl_ac_422_w16)
-+        .xword L(ipred_cfl_ac_422_w8)
-+        .xword L(ipred_cfl_ac_422_w4)
-+        .xword 0
- 
- L(ipred_cfl_ac_422_w16_tbl):
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad0)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad1)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad2)
--        .hword L(ipred_cfl_ac_422_w16_tbl) - L(ipred_cfl_ac_422_w16_wpad3)
-+        .xword L(ipred_cfl_ac_422_w16_wpad0)
-+        .xword L(ipred_cfl_ac_422_w16_wpad1)
-+        .xword L(ipred_cfl_ac_422_w16_wpad2)
-+        .xword L(ipred_cfl_ac_422_w16_wpad3)
-+	.popsection
- endfunc
- 
- // void cfl_ac_444_8bpc_neon(int16_t *const ac, const pixel *const ypx,
-@@ -4955,14 +5002,14 @@ endfunc
- function ipred_cfl_ac_444_8bpc_neon, export=1
-         clz             w8,  w5
-         lsl             w4,  w4,  #2
--        adr             x7,  L(ipred_cfl_ac_444_tbl)
-+        adrp            x7,  L(ipred_cfl_ac_444_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_444_tbl)
-         sub             w8,  w8,  #26
--        ldrh            w8,  [x7, w8, uxtw #1]
-+        ldr             x7,  [x7, w8, uxtw #3]
-         movi            v16.8h,  #0
-         movi            v17.8h,  #0
-         movi            v18.8h,  #0
-         movi            v19.8h,  #0
--        sub             x7,  x7,  w8, uxtw
-         sub             w8,  w6,  w4         // height - h_pad
-         rbit            w9,  w5              // rbit(width)
-         rbit            w10, w6              // rbit(height)
-@@ -5083,9 +5130,10 @@ L(ipred_cfl_ac_444_w16_wpad):
- 
- L(ipred_cfl_ac_444_w32):
-         AARCH64_VALID_JUMP_TARGET
--        adr             x7,  L(ipred_cfl_ac_444_w32_tbl)
--        ldrh            w3,  [x7, w3, uxtw] // (w3>>1) << 1
--        sub             x7,  x7,  w3, uxtw
-+        adrp            x7,  L(ipred_cfl_ac_444_w32_tbl)
-+        add             x7,  x7, :lo12: L(ipred_cfl_ac_444_w32_tbl)
-+        lsr             w3,  w3, #1
-+        ldr             x7,  [x7, w3, uxtw #3] // (w3>>1) << 3
-         br              x7
- 
- L(ipred_cfl_ac_444_w32_wpad0):
-@@ -5231,15 +5279,17 @@ L(ipred_cfl_ac_444_w32_hpad):
-         dup             v4.8h,   v4.h[0]
-         b               L(ipred_cfl_ac_420_w8_subtract_dc)
- 
-+	.pushsection .data.rel.ro, "aw"
- L(ipred_cfl_ac_444_tbl):
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w32)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w16)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w8)
--        .hword L(ipred_cfl_ac_444_tbl) - L(ipred_cfl_ac_444_w4)
-+        .xword L(ipred_cfl_ac_444_w32)
-+        .xword L(ipred_cfl_ac_444_w16)
-+        .xword L(ipred_cfl_ac_444_w8)
-+        .xword L(ipred_cfl_ac_444_w4)
- 
- L(ipred_cfl_ac_444_w32_tbl):
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad0)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad2)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad4)
--        .hword L(ipred_cfl_ac_444_w32_tbl) - L(ipred_cfl_ac_444_w32_wpad6)
-+        .xword L(ipred_cfl_ac_444_w32_wpad0)
-+        .xword L(ipred_cfl_ac_444_w32_wpad2)
-+        .xword L(ipred_cfl_ac_444_w32_wpad4)
-+        .xword L(ipred_cfl_ac_444_w32_wpad6)
-+	.popsection
- endfunc
Index: patches/patch-src_arm_64_mc16_S
===================================================================
RCS file: patches/patch-src_arm_64_mc16_S
diff -N patches/patch-src_arm_64_mc16_S
--- patches/patch-src_arm_64_mc16_S	24 Apr 2023 21:06:59 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,523 +0,0 @@
-Index: src/arm/64/mc16.S
---- src/arm/64/mc16.S.orig
-+++ src/arm/64/mc16.S
-@@ -145,11 +145,11 @@ function \type\()_16bpc_neon, export=1
-         dup             v27.4s,  w6
-         neg             v27.4s,  v27.4s
- .endif
--        adr             x7,  L(\type\()_tbl)
-+        adrp            x7,  L(\type\()_tbl)
-+        add             x7,  x7, :lo12: L(\type\()_tbl)
-         sub             w4,  w4,  #24
-         \type           v4,  v5,  v0,  v1,  v2,  v3
--        ldrh            w4,  [x7, x4, lsl #1]
--        sub             x7,  x7,  w4, uxtw
-+        ldr             x7,  [x7, x4, lsl #3]
-         br              x7
- 40:
-         AARCH64_VALID_JUMP_TARGET
-@@ -228,13 +228,15 @@ function \type\()_16bpc_neon, export=1
-         b               128b
- 0:
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_tbl):
--        .hword L(\type\()_tbl) - 1280b
--        .hword L(\type\()_tbl) -  640b
--        .hword L(\type\()_tbl) -   32b
--        .hword L(\type\()_tbl) -   16b
--        .hword L(\type\()_tbl) -   80b
--        .hword L(\type\()_tbl) -   40b
-+        .xword 1280b
-+        .xword  640b
-+        .xword   32b
-+        .xword   16b
-+        .xword   80b
-+        .xword   40b
-+	.popsection
- endfunc
- .endm
- 
-@@ -247,12 +249,12 @@ bidir_fn mask, w7
- function w_mask_\type\()_16bpc_neon, export=1
-         ldr             w8,  [sp]
-         clz             w9,  w4
--        adr             x10, L(w_mask_\type\()_tbl)
-+        adrp            x10, L(w_mask_\type\()_tbl)
-+        add             x10, x10, :lo12: L(w_mask_\type\()_tbl)
-         dup             v31.8h,  w8   // bitdepth_max
-         sub             w9,  w9,  #24
-         clz             w8,  w8       // clz(bitdepth_max)
--        ldrh            w9,  [x10,  x9,  lsl #1]
--        sub             x10, x10, w9,  uxtw
-+        ldr             x10, [x10,  x9,  lsl #3]
-         sub             w8,  w8,  #12 // sh = intermediate_bits + 6 = clz(bitdepth_max) - 12
-         mov             w9,  #PREP_BIAS*64
-         neg             w8,  w8       // -sh
-@@ -541,13 +543,15 @@ function w_mask_\type\()_16bpc_neon, export=1
-         add             x12, x12, x1
-         b.gt            161b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(w_mask_\type\()_tbl):
--        .hword L(w_mask_\type\()_tbl) - 1280b
--        .hword L(w_mask_\type\()_tbl) -  640b
--        .hword L(w_mask_\type\()_tbl) -  320b
--        .hword L(w_mask_\type\()_tbl) -  160b
--        .hword L(w_mask_\type\()_tbl) -    8b
--        .hword L(w_mask_\type\()_tbl) -    4b
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword    8b
-+        .xword    4b
-+	.popsection
- endfunc
- .endm
- 
-@@ -557,11 +561,11 @@ w_mask_fn 420
- 
- 
- function blend_16bpc_neon, export=1
--        adr             x6,  L(blend_tbl)
-+        adrp            x6,  L(blend_tbl)
-+        add             x6,  x6, :lo12: L(blend_tbl)
-         clz             w3,  w3
-         sub             w3,  w3,  #26
--        ldrh            w3,  [x6,  x3,  lsl #1]
--        sub             x6,  x6,  w3,  uxtw
-+        ldr             x6,  [x6,  x3,  lsl #3]
-         add             x8,  x0,  x1
-         br              x6
- 40:
-@@ -673,15 +677,18 @@ function blend_16bpc_neon, export=1
-         st1             {v0.8h, v1.8h, v2.8h, v3.8h}, [x0], x1
-         b.gt            32b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_tbl):
--        .hword L(blend_tbl) -  32b
--        .hword L(blend_tbl) - 160b
--        .hword L(blend_tbl) -  80b
--        .hword L(blend_tbl) -  40b
-+        .xword  32b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- function blend_h_16bpc_neon, export=1
--        adr             x6,  L(blend_h_tbl)
-+        adrp            x6,  L(blend_h_tbl)
-+        add             x6,  x6, :lo12: L(blend_h_tbl)
-         movrel          x5,  X(obmc_masks)
-         add             x5,  x5,  w4,  uxtw
-         sub             w4,  w4,  w4,  lsr #2
-@@ -689,8 +696,7 @@ function blend_h_16bpc_neon, export=1
-         add             x8,  x0,  x1
-         lsl             x1,  x1,  #1
-         sub             w7,  w7,  #24
--        ldrh            w7,  [x6,  x7,  lsl #1]
--        sub             x6,  x6,  w7, uxtw
-+        ldr             x6,  [x6,  x7,  lsl #3]
-         br              x6
- 2:
-         AARCH64_VALID_JUMP_TARGET
-@@ -835,26 +841,28 @@ function blend_h_16bpc_neon, export=1
-         add             x7,  x7,  w3,  uxtw #1
-         b.gt            321b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_h_tbl):
--        .hword L(blend_h_tbl) - 1280b
--        .hword L(blend_h_tbl) -  640b
--        .hword L(blend_h_tbl) -  320b
--        .hword L(blend_h_tbl) -   16b
--        .hword L(blend_h_tbl) -    8b
--        .hword L(blend_h_tbl) -    4b
--        .hword L(blend_h_tbl) -    2b
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword   16b
-+        .xword    8b
-+        .xword    4b
-+        .xword    2b
-+	.popsection
- endfunc
- 
- function blend_v_16bpc_neon, export=1
--        adr             x6,  L(blend_v_tbl)
-+        adrp            x6,  L(blend_v_tbl)
-+        add             x6,  x6, :lo12: L(blend_v_tbl)
-         movrel          x5,  X(obmc_masks)
-         add             x5,  x5,  w3,  uxtw
-         clz             w3,  w3
-         add             x8,  x0,  x1
-         lsl             x1,  x1,  #1
-         sub             w3,  w3,  #26
--        ldrh            w3,  [x6,  x3,  lsl #1]
--        sub             x6,  x6,  w3,  uxtw
-+        ldr             x6,  [x6,  x3,  lsl #3]
-         br              x6
- 20:
-         AARCH64_VALID_JUMP_TARGET
-@@ -992,21 +1000,23 @@ function blend_v_16bpc_neon, export=1
-         st1             {v4.8h, v5.8h, v6.8h}, [x8], x1
-         b.gt            32b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_v_tbl):
--        .hword L(blend_v_tbl) - 320b
--        .hword L(blend_v_tbl) - 160b
--        .hword L(blend_v_tbl) -  80b
--        .hword L(blend_v_tbl) -  40b
--        .hword L(blend_v_tbl) -  20b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+        .xword  20b
-+	.popsection
- endfunc
- 
- 
- // This has got the same signature as the put_8tap functions,
- // and assumes that x9 is set to (clz(w)-24).
- function put_neon
--        adr             x10, L(put_tbl)
--        ldrh            w9, [x10, x9, lsl #1]
--        sub             x10, x10, w9, uxtw
-+        adrp            x10, L(put_tbl)
-+        add             x10, x10, :lo12: L(put_tbl)
-+        ldr             x10, [x10, x9, lsl #3]
-         br              x10
- 
- 2:
-@@ -1106,14 +1116,16 @@ function put_neon
-         b.gt            128b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(put_tbl):
--        .hword L(put_tbl) - 128b
--        .hword L(put_tbl) -  64b
--        .hword L(put_tbl) -  32b
--        .hword L(put_tbl) -  16b
--        .hword L(put_tbl) -  80b
--        .hword L(put_tbl) -   4b
--        .hword L(put_tbl) -   2b
-+        .xword 128b
-+        .xword  64b
-+        .xword  32b
-+        .xword  16b
-+        .xword  80b
-+        .xword   4b
-+        .xword   2b
-+	.popsection
- endfunc
- 
- 
-@@ -1121,11 +1133,11 @@ endfunc
- // and assumes that x9 is set to (clz(w)-24), w7 to intermediate_bits and
- // x8 to w*2.
- function prep_neon
--        adr             x10, L(prep_tbl)
--        ldrh            w9, [x10, x9, lsl #1]
-+        adrp            x10, L(prep_tbl)
-+        add             x10, x10, :lo12: L(prep_tbl)
-+        ldr             x10, [x10, x9, lsl #3]
-         dup             v31.8h,  w7   // intermediate_bits
-         movi            v30.8h,  #(PREP_BIAS >> 8), lsl #8
--        sub             x10, x10, w9, uxtw
-         br              x10
- 
- 40:
-@@ -1278,13 +1290,15 @@ function prep_neon
-         b.gt            128b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(prep_tbl):
--        .hword L(prep_tbl) - 128b
--        .hword L(prep_tbl) -  64b
--        .hword L(prep_tbl) -  32b
--        .hword L(prep_tbl) -  16b
--        .hword L(prep_tbl) -  80b
--        .hword L(prep_tbl) -  40b
-+        .xword 128b
-+        .xword  64b
-+        .xword  32b
-+        .xword  16b
-+        .xword  80b
-+        .xword  40b
-+	.popsection
- endfunc
- 
- 
-@@ -1563,16 +1577,16 @@ L(\type\()_8tap_h):
-         add             \xmx, x11, \mx, uxtw #3
-         b.ne            L(\type\()_8tap_hv)
- 
--        adr             x10, L(\type\()_8tap_h_tbl)
-+        adrp            x10, L(\type\()_8tap_h_tbl)
-+        add             x10, x10, :lo12: L(\type\()_8tap_h_tbl)
-         dup             v30.4s,  w12           // 6 - intermediate_bits
--        ldrh            w9,  [x10, x9, lsl #1]
-+        ldr             x10, [x10, x9, lsl #3]
-         neg             v30.4s,  v30.4s        // -(6-intermediate_bits)
- .ifc \type, put
-         dup             v29.8h,  \bdmax        // intermediate_bits
- .else
-         movi            v28.8h,  #(PREP_BIAS >> 8), lsl #8
- .endif
--        sub             x10, x10, w9, uxtw
- .ifc \type, put
-         neg             v29.8h,  v29.8h        // -intermediate_bits
- .endif
-@@ -1734,15 +1748,17 @@ L(\type\()_8tap_h):
-         b.gt            81b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_h_tbl):
--        .hword L(\type\()_8tap_h_tbl) - 1280b
--        .hword L(\type\()_8tap_h_tbl) -  640b
--        .hword L(\type\()_8tap_h_tbl) -  320b
--        .hword L(\type\()_8tap_h_tbl) -  160b
--        .hword L(\type\()_8tap_h_tbl) -   80b
--        .hword L(\type\()_8tap_h_tbl) -   40b
--        .hword L(\type\()_8tap_h_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- 
- L(\type\()_8tap_v):
-@@ -1758,12 +1774,12 @@ L(\type\()_8tap_v):
-         dup             v30.4s,  w12           // 6 - intermediate_bits
-         movi            v29.8h,  #(PREP_BIAS >> 8), lsl #8
- .endif
--        adr             x10, L(\type\()_8tap_v_tbl)
--        ldrh            w9,  [x10, x9, lsl #1]
-+        adrp            x10, L(\type\()_8tap_v_tbl)
-+        add             x10, x10, :lo12: L(\type\()_8tap_v_tbl)
-+        ldr             x10, [x10, x9, lsl #3]
- .ifc \type, prep
-         neg             v30.4s,  v30.4s        // -(6-intermediate_bits)
- .endif
--        sub             x10, x10, w9, uxtw
-         br              x10
- 
- 20:     // 2xN v
-@@ -2029,15 +2045,17 @@ L(\type\()_8tap_v):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_v_tbl):
--        .hword L(\type\()_8tap_v_tbl) - 1280b
--        .hword L(\type\()_8tap_v_tbl) -  640b
--        .hword L(\type\()_8tap_v_tbl) -  320b
--        .hword L(\type\()_8tap_v_tbl) -  160b
--        .hword L(\type\()_8tap_v_tbl) -   80b
--        .hword L(\type\()_8tap_v_tbl) -   40b
--        .hword L(\type\()_8tap_v_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- L(\type\()_8tap_hv):
-         cmp             \h,  #4
-@@ -2048,16 +2066,16 @@ L(\type\()_8tap_hv):
- 4:
-         add             \xmy, x11, \my, uxtw #3
- 
--        adr             x10, L(\type\()_8tap_hv_tbl)
-+        adrp            x10, L(\type\()_8tap_hv_tbl)
-+        add             x10, x10, :lo12: L(\type\()_8tap_hv_tbl)
-         dup             v30.4s,  w12           // 6 - intermediate_bits
--        ldrh            w9,  [x10, x9, lsl #1]
-+        ldr             x10, [x10, x9, lsl #3]
-         neg             v30.4s,  v30.4s        // -(6-intermediate_bits)
- .ifc \type, put
-         dup             v29.4s,  w13           // 6 + intermediate_bits
- .else
-         movi            v29.8h,  #(PREP_BIAS >> 8), lsl #8
- .endif
--        sub             x10, x10, w9, uxtw
- .ifc \type, put
-         neg             v29.4s,  v29.4s        // -(6+intermediate_bits)
- .endif
-@@ -2623,15 +2641,17 @@ L(\type\()_8tap_filter_8):
-         uzp1            v24.8h,  v27.8h,  v28.8h // Ditto
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_hv_tbl):
--        .hword L(\type\()_8tap_hv_tbl) - 1280b
--        .hword L(\type\()_8tap_hv_tbl) -  640b
--        .hword L(\type\()_8tap_hv_tbl) -  320b
--        .hword L(\type\()_8tap_hv_tbl) -  160b
--        .hword L(\type\()_8tap_hv_tbl) -   80b
--        .hword L(\type\()_8tap_hv_tbl) -   40b
--        .hword L(\type\()_8tap_hv_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- endfunc
- 
- 
-@@ -2665,16 +2685,16 @@ function \type\()_bilin_16bpc_neon, export=1
- L(\type\()_bilin_h):
-         cbnz            \my, L(\type\()_bilin_hv)
- 
--        adr             x10, L(\type\()_bilin_h_tbl)
-+        adrp            x10, L(\type\()_bilin_h_tbl)
-+        add             x10, x10, :lo12: L(\type\()_bilin_h_tbl)
-         dup             v31.8h,  w11      // 4 - intermediate_bits
--        ldrh            w9,  [x10, x9, lsl #1]
-+        ldr             x10, [x10, x9, lsl #3]
-         neg             v31.8h,  v31.8h   // -(4-intermediate_bits)
- .ifc \type, put
-         dup             v30.8h,  \bdmax   // intermediate_bits
- .else
-         movi            v29.8h,  #(PREP_BIAS >> 8), lsl #8
- .endif
--        sub             x10, x10, w9, uxtw
- .ifc \type, put
-         neg             v30.8h,  v30.8h   // -intermediate_bits
- .endif
-@@ -2832,29 +2852,31 @@ L(\type\()_bilin_h):
-         b.gt            161b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_h_tbl):
--        .hword L(\type\()_bilin_h_tbl) - 1280b
--        .hword L(\type\()_bilin_h_tbl) -  640b
--        .hword L(\type\()_bilin_h_tbl) -  320b
--        .hword L(\type\()_bilin_h_tbl) -  160b
--        .hword L(\type\()_bilin_h_tbl) -   80b
--        .hword L(\type\()_bilin_h_tbl) -   40b
--        .hword L(\type\()_bilin_h_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- 
- L(\type\()_bilin_v):
-         cmp             \h,  #4
--        adr             x10, L(\type\()_bilin_v_tbl)
-+        adrp            x10, L(\type\()_bilin_v_tbl)
-+        add             x10, x10, :lo12: L(\type\()_bilin_v_tbl)
- .ifc \type, prep
-         dup             v31.8h,  w11      // 4 - intermediate_bits
- .endif
--        ldrh            w9,  [x10, x9, lsl #1]
-+        ldr             x10, [x10, x9, lsl #3]
- .ifc \type, prep
-         movi            v29.8h,  #(PREP_BIAS >> 8), lsl #8
-         neg             v31.8h,  v31.8h   // -(4-intermediate_bits)
- .endif
--        sub             x10, x10, w9, uxtw
-         br              x10
- 
- 20:     // 2xN v
-@@ -3030,27 +3052,29 @@ L(\type\()_bilin_v):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_v_tbl):
--        .hword L(\type\()_bilin_v_tbl) - 1280b
--        .hword L(\type\()_bilin_v_tbl) -  640b
--        .hword L(\type\()_bilin_v_tbl) -  320b
--        .hword L(\type\()_bilin_v_tbl) -  160b
--        .hword L(\type\()_bilin_v_tbl) -   80b
--        .hword L(\type\()_bilin_v_tbl) -   40b
--        .hword L(\type\()_bilin_v_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- L(\type\()_bilin_hv):
--        adr             x10, L(\type\()_bilin_hv_tbl)
-+        adrp            x10, L(\type\()_bilin_hv_tbl)
-+        add             x10, x10, :lo12: L(\type\()_bilin_hv_tbl)
-         dup             v31.8h,  w11      // 4 - intermediate_bits
--        ldrh            w9,  [x10, x9, lsl #1]
-+        ldr             x10, [x10, x9, lsl #3]
-         neg             v31.8h,  v31.8h   // -(4-intermediate_bits)
- .ifc \type, put
-         dup             v30.4s,  w12      // 4 + intermediate_bits
- .else
-         movi            v29.8h,  #(PREP_BIAS >> 8), lsl #8
- .endif
--        sub             x10, x10, w9, uxtw
- .ifc \type, put
-         neg             v30.4s,  v30.4s   // -(4+intermediate_bits)
- .endif
-@@ -3224,15 +3248,17 @@ L(\type\()_bilin_hv):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_hv_tbl):
--        .hword L(\type\()_bilin_hv_tbl) - 1280b
--        .hword L(\type\()_bilin_hv_tbl) -  640b
--        .hword L(\type\()_bilin_hv_tbl) -  320b
--        .hword L(\type\()_bilin_hv_tbl) -  160b
--        .hword L(\type\()_bilin_hv_tbl) -   80b
--        .hword L(\type\()_bilin_hv_tbl) -   40b
--        .hword L(\type\()_bilin_hv_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- endfunc
- .endm
- 
Index: patches/patch-src_arm_64_mc_S
===================================================================
RCS file: patches/patch-src_arm_64_mc_S
diff -N patches/patch-src_arm_64_mc_S
--- patches/patch-src_arm_64_mc_S	24 Apr 2023 21:06:59 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,483 +0,0 @@
-Index: src/arm/64/mc.S
---- src/arm/64/mc.S.orig
-+++ src/arm/64/mc.S
-@@ -79,11 +79,11 @@ function \type\()_8bpc_neon, export=1
- .ifc \type, mask
-         movi            v31.16b, #256-2
- .endif
--        adr             x7,  L(\type\()_tbl)
-+        adrp            x7,  L(\type\()_tbl)
-+        add             x7,  x7, :lo12: L(\type\()_tbl)
-         sub             w4,  w4,  #24
--        ldrh            w4,  [x7, x4, lsl #1]
-+        ldr             x7,  [x7, x4, lsl #3]
-         \type           v4,  v0,  v1,  v2,  v3
--        sub             x7,  x7,  w4, uxtw
-         br              x7
- 40:
-         AARCH64_VALID_JUMP_TARGET
-@@ -192,13 +192,15 @@ function \type\()_8bpc_neon, export=1
-         b               128b
- 0:
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_tbl):
--        .hword L(\type\()_tbl) - 1280b
--        .hword L(\type\()_tbl) -  640b
--        .hword L(\type\()_tbl) -  320b
--        .hword L(\type\()_tbl) -   16b
--        .hword L(\type\()_tbl) -   80b
--        .hword L(\type\()_tbl) -   40b
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword   16b
-+        .xword   80b
-+        .xword   40b
-+	.popsection
- endfunc
- .endm
- 
-@@ -210,10 +212,10 @@ bidir_fn mask
- .macro w_mask_fn type
- function w_mask_\type\()_8bpc_neon, export=1
-         clz             w8,  w4
--        adr             x9,  L(w_mask_\type\()_tbl)
-+        adrp            x9,  L(w_mask_\type\()_tbl)
-+        add             x9,  x9, :lo12: L(w_mask_\type\()_tbl)
-         sub             w8,  w8,  #24
--        ldrh            w8,  [x9,  x8,  lsl #1]
--        sub             x9,  x9,  w8,  uxtw
-+        ldr             x9,  [x9,  x8,  lsl #3]
-         mov             w10, #6903
-         dup             v0.8h,   w10
- .if \type == 444
-@@ -413,13 +415,15 @@ function w_mask_\type\()_8bpc_neon, export=1
-         add             x12, x12, x1
-         b.gt            161b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(w_mask_\type\()_tbl):
--        .hword L(w_mask_\type\()_tbl) - 1280b
--        .hword L(w_mask_\type\()_tbl) -  640b
--        .hword L(w_mask_\type\()_tbl) -  320b
--        .hword L(w_mask_\type\()_tbl) -  160b
--        .hword L(w_mask_\type\()_tbl) -    8b
--        .hword L(w_mask_\type\()_tbl) -    4b
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword    8b
-+        .xword    4b
-+	.popsection
- endfunc
- .endm
- 
-@@ -429,11 +433,11 @@ w_mask_fn 420
- 
- 
- function blend_8bpc_neon, export=1
--        adr             x6,  L(blend_tbl)
-+        adrp            x6,  L(blend_tbl)
-+        add             x6,  x6, :lo12: L(blend_tbl)
-         clz             w3,  w3
-         sub             w3,  w3,  #26
--        ldrh            w3,  [x6,  x3,  lsl #1]
--        sub             x6,  x6,  w3,  uxtw
-+        ldr             x6,  [x6,  x3,  lsl #3]
-         movi            v4.16b,  #64
-         add             x8,  x0,  x1
-         lsl             x1,  x1,  #1
-@@ -535,15 +539,18 @@ function blend_8bpc_neon, export=1
-         st1             {v27.16b, v28.16b}, [x8],  x1
-         b.gt            32b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_tbl):
--        .hword L(blend_tbl) - 32b
--        .hword L(blend_tbl) - 16b
--        .hword L(blend_tbl) -  8b
--        .hword L(blend_tbl) -  4b
-+        .xword 32b
-+        .xword 16b
-+        .xword  8b
-+        .xword  4b
-+	.popsection
- endfunc
- 
- function blend_h_8bpc_neon, export=1
--        adr             x6,  L(blend_h_tbl)
-+        adrp            x6,  L(blend_h_tbl)
-+        add             x6,  x6, :lo12: L(blend_h_tbl)
-         movrel          x5,  X(obmc_masks)
-         add             x5,  x5,  w4,  uxtw
-         sub             w4,  w4,  w4,  lsr #2
-@@ -552,8 +559,7 @@ function blend_h_8bpc_neon, export=1
-         add             x8,  x0,  x1
-         lsl             x1,  x1,  #1
-         sub             w7,  w7,  #24
--        ldrh            w7,  [x6,  x7,  lsl #1]
--        sub             x6,  x6,  w7, uxtw
-+        ldr             x6,  [x6,  x7,  lsl #3]
-         br              x6
- 2:
-         AARCH64_VALID_JUMP_TARGET
-@@ -682,18 +688,21 @@ function blend_h_8bpc_neon, export=1
-         add             x7,  x7,  w3,  uxtw
-         b.gt            321b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_h_tbl):
--        .hword L(blend_h_tbl) - 1280b
--        .hword L(blend_h_tbl) -  640b
--        .hword L(blend_h_tbl) -  320b
--        .hword L(blend_h_tbl) -   16b
--        .hword L(blend_h_tbl) -    8b
--        .hword L(blend_h_tbl) -    4b
--        .hword L(blend_h_tbl) -    2b
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword   16b
-+        .xword    8b
-+        .xword    4b
-+        .xword    2b
-+	.popsection
- endfunc
- 
- function blend_v_8bpc_neon, export=1
--        adr             x6,  L(blend_v_tbl)
-+        adrp            x6,  L(blend_v_tbl)
-+        add             x6,  x6, :lo12: L(blend_v_tbl)
-         movrel          x5,  X(obmc_masks)
-         add             x5,  x5,  w3,  uxtw
-         clz             w3,  w3
-@@ -701,8 +710,7 @@ function blend_v_8bpc_neon, export=1
-         add             x8,  x0,  x1
-         lsl             x1,  x1,  #1
-         sub             w3,  w3,  #26
--        ldrh            w3,  [x6,  x3,  lsl #1]
--        sub             x6,  x6,  w3,  uxtw
-+        ldr             x6,  [x6,  x3,  lsl #3]
-         br              x6
- 20:
-         AARCH64_VALID_JUMP_TARGET
-@@ -826,21 +834,23 @@ function blend_v_8bpc_neon, export=1
-         st1             {v27.8b},  [x8],  x1
-         b.gt            32b
-         ret
-+	.pushsection .data.rel.ro, "aw"
- L(blend_v_tbl):
--        .hword L(blend_v_tbl) - 320b
--        .hword L(blend_v_tbl) - 160b
--        .hword L(blend_v_tbl) -  80b
--        .hword L(blend_v_tbl) -  40b
--        .hword L(blend_v_tbl) -  20b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+        .xword  20b
-+	.popsection
- endfunc
- 
- 
- // This has got the same signature as the put_8tap functions,
- // and assumes that x8 is set to (clz(w)-24).
- function put_neon
--        adr             x9,  L(put_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(put_tbl)
-+        add             x9,  x9, :lo12: L(put_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 2:
-@@ -926,23 +936,25 @@ function put_neon
-         b.gt            128b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(put_tbl):
--        .hword L(put_tbl) - 128b
--        .hword L(put_tbl) -  64b
--        .hword L(put_tbl) -  32b
--        .hword L(put_tbl) - 160b
--        .hword L(put_tbl) -   8b
--        .hword L(put_tbl) -   4b
--        .hword L(put_tbl) -   2b
-+        .xword 128b
-+        .xword  64b
-+        .xword  32b
-+        .xword 160b
-+        .xword   8b
-+        .xword   4b
-+        .xword   2b
-+	.popsection
- endfunc
- 
- 
- // This has got the same signature as the prep_8tap functions,
- // and assumes that x8 is set to (clz(w)-24), and x7 to w*2.
- function prep_neon
--        adr             x9,  L(prep_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(prep_tbl)
-+        add             x9,  x9, :lo12: L(prep_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 4:
-@@ -1058,13 +1070,15 @@ function prep_neon
-         b.gt            128b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(prep_tbl):
--        .hword L(prep_tbl) - 1280b
--        .hword L(prep_tbl) -  640b
--        .hword L(prep_tbl) -  320b
--        .hword L(prep_tbl) -  160b
--        .hword L(prep_tbl) -    8b
--        .hword L(prep_tbl) -    4b
-+        .xword 1280b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 8b
-+        .xword 4b
-+	.popsection
- endfunc
- 
- 
-@@ -1370,9 +1384,9 @@ L(\type\()_8tap_h):
-         add             \xmx, x10, \mx, uxtw #3
-         b.ne            L(\type\()_8tap_hv)
- 
--        adr             x9,  L(\type\()_8tap_h_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_8tap_h_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_8tap_h_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:     // 2xN h
-@@ -1575,15 +1589,17 @@ L(\type\()_8tap_h):
-         b.gt            161b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_h_tbl):
--        .hword L(\type\()_8tap_h_tbl) - 1280b
--        .hword L(\type\()_8tap_h_tbl) -  640b
--        .hword L(\type\()_8tap_h_tbl) -  320b
--        .hword L(\type\()_8tap_h_tbl) -  160b
--        .hword L(\type\()_8tap_h_tbl) -   80b
--        .hword L(\type\()_8tap_h_tbl) -   40b
--        .hword L(\type\()_8tap_h_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword 640b
-+        .xword 320b
-+        .xword 160b
-+        .xword 80b
-+        .xword 40b
-+        .xword 20b
-+        .xword 0
-+	.popsection
- 
- 
- L(\type\()_8tap_v):
-@@ -1595,9 +1611,9 @@ L(\type\()_8tap_v):
- 4:
-         add             \xmy, x10, \my, uxtw #3
- 
--        adr             x9,  L(\type\()_8tap_v_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_8tap_v_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_8tap_v_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:     // 2xN v
-@@ -1901,15 +1917,17 @@ L(\type\()_8tap_v):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_v_tbl):
--        .hword L(\type\()_8tap_v_tbl) - 1280b
--        .hword L(\type\()_8tap_v_tbl) -  640b
--        .hword L(\type\()_8tap_v_tbl) -  320b
--        .hword L(\type\()_8tap_v_tbl) -  160b
--        .hword L(\type\()_8tap_v_tbl) -   80b
--        .hword L(\type\()_8tap_v_tbl) -   40b
--        .hword L(\type\()_8tap_v_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- L(\type\()_8tap_hv):
-         cmp             \h,  #4
-@@ -1920,9 +1938,9 @@ L(\type\()_8tap_hv):
- 4:
-         add             \xmy,  x10, \my, uxtw #3
- 
--        adr             x9,  L(\type\()_8tap_hv_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_8tap_hv_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_8tap_hv_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:
-@@ -2444,15 +2462,17 @@ L(\type\()_8tap_filter_8):
-         srshr           v25.8h,  v25.8h, #2
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_8tap_hv_tbl):
--        .hword L(\type\()_8tap_hv_tbl) - 1280b
--        .hword L(\type\()_8tap_hv_tbl) -  640b
--        .hword L(\type\()_8tap_hv_tbl) -  320b
--        .hword L(\type\()_8tap_hv_tbl) -  160b
--        .hword L(\type\()_8tap_hv_tbl) -   80b
--        .hword L(\type\()_8tap_hv_tbl) -   40b
--        .hword L(\type\()_8tap_hv_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- endfunc
- 
- 
-@@ -2478,9 +2498,9 @@ function \type\()_bilin_8bpc_neon, export=1
- L(\type\()_bilin_h):
-         cbnz            \my, L(\type\()_bilin_hv)
- 
--        adr             x9,  L(\type\()_bilin_h_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_bilin_h_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_bilin_h_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:     // 2xN h
-@@ -2624,22 +2644,24 @@ L(\type\()_bilin_h):
-         b.gt            161b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_h_tbl):
--        .hword L(\type\()_bilin_h_tbl) - 1280b
--        .hword L(\type\()_bilin_h_tbl) -  640b
--        .hword L(\type\()_bilin_h_tbl) -  320b
--        .hword L(\type\()_bilin_h_tbl) -  160b
--        .hword L(\type\()_bilin_h_tbl) -   80b
--        .hword L(\type\()_bilin_h_tbl) -   40b
--        .hword L(\type\()_bilin_h_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- 
- L(\type\()_bilin_v):
-         cmp             \h,  #4
--        adr             x9,  L(\type\()_bilin_v_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_bilin_v_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_bilin_v_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:     // 2xN v
-@@ -2810,22 +2832,24 @@ L(\type\()_bilin_v):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_v_tbl):
--        .hword L(\type\()_bilin_v_tbl) - 1280b
--        .hword L(\type\()_bilin_v_tbl) -  640b
--        .hword L(\type\()_bilin_v_tbl) -  320b
--        .hword L(\type\()_bilin_v_tbl) -  160b
--        .hword L(\type\()_bilin_v_tbl) -   80b
--        .hword L(\type\()_bilin_v_tbl) -   40b
--        .hword L(\type\()_bilin_v_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- 
- L(\type\()_bilin_hv):
-         uxtl            v2.8h, v2.8b
-         uxtl            v3.8h, v3.8b
--        adr             x9,  L(\type\()_bilin_hv_tbl)
--        ldrh            w8,  [x9, x8, lsl #1]
--        sub             x9,  x9,  w8, uxtw
-+        adrp            x9,  L(\type\()_bilin_hv_tbl)
-+        add             x9,  x9, :lo12: L(\type\()_bilin_hv_tbl)
-+        ldr             x9,  [x9, x8, lsl #3]
-         br              x9
- 
- 20:     // 2xN hv
-@@ -2975,15 +2999,17 @@ L(\type\()_bilin_hv):
- 0:
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(\type\()_bilin_hv_tbl):
--        .hword L(\type\()_bilin_hv_tbl) - 1280b
--        .hword L(\type\()_bilin_hv_tbl) -  640b
--        .hword L(\type\()_bilin_hv_tbl) -  320b
--        .hword L(\type\()_bilin_hv_tbl) -  160b
--        .hword L(\type\()_bilin_hv_tbl) -   80b
--        .hword L(\type\()_bilin_hv_tbl) -   40b
--        .hword L(\type\()_bilin_hv_tbl) -   20b
--        .hword 0
-+        .xword 1280b
-+        .xword  640b
-+        .xword  320b
-+        .xword  160b
-+        .xword   80b
-+        .xword   40b
-+        .xword   20b
-+        .xword 0
-+	.popsection
- endfunc
- .endm
- 
Index: patches/patch-src_arm_64_refmvs_S
===================================================================
RCS file: patches/patch-src_arm_64_refmvs_S
diff -N patches/patch-src_arm_64_refmvs_S
--- patches/patch-src_arm_64_refmvs_S	24 Apr 2023 21:06:59 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,40 +0,0 @@
-Index: src/arm/64/refmvs.S
---- src/arm/64/refmvs.S.orig
-+++ src/arm/64/refmvs.S
-@@ -34,13 +34,13 @@
- function splat_mv_neon, export=1
-         ld1             {v3.16b},  [x1]
-         clz             w3,  w3
--        adr             x5,  L(splat_tbl)
-+        adrp            x5,  L(splat_tbl)
-+        add             x5,  x5, :lo12: L(splat_tbl)
-         sub             w3,  w3,  #26
-         ext             v2.16b,  v3.16b,  v3.16b,  #12
--        ldrh            w3,  [x5, w3, uxtw #1]
-+        ldr             x3,  [x5, w3, uxtw #3]
-         add             w2,  w2,  w2,  lsl #1
-         ext             v0.16b,  v2.16b,  v3.16b,  #4
--        sub             x3,  x5,  w3, uxtw
-         ext             v1.16b,  v2.16b,  v3.16b,  #8
-         lsl             w2,  w2,  #2
-         ext             v2.16b,  v2.16b,  v3.16b,  #12
-@@ -81,11 +81,13 @@ function splat_mv_neon, export=1
-         b.gt            1b
-         ret
- 
-+	.pushsection .data.rel.ro, "aw"
- L(splat_tbl):
--        .hword L(splat_tbl) -  320b
--        .hword L(splat_tbl) -  160b
--        .hword L(splat_tbl) -   80b
--        .hword L(splat_tbl) -   40b
--        .hword L(splat_tbl) -   20b
--        .hword L(splat_tbl) -   10b
-+        .xword 320b
-+        .xword 160b
-+        .xword  80b
-+        .xword  40b
-+        .xword  20b
-+        .xword  10b
-+	.popsection
- endfunc
Index: patches/patch-src_arm_cpu_c
===================================================================
RCS file: patches/patch-src_arm_cpu_c
diff -N patches/patch-src_arm_cpu_c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_arm_cpu_c	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,38 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/arm/cpu.c
+--- src/arm/cpu.c.orig
++++ src/arm/cpu.c
+@@ -43,15 +43,8 @@
+ #define HWCAP2_AARCH64_I8MM   (1 << 13)
+ 
+ COLD unsigned dav1d_get_cpu_flags_arm(void) {
+-#if HAVE_GETAUXVAL
+-    unsigned long hw_cap = getauxval(AT_HWCAP);
+-    unsigned long hw_cap2 = getauxval(AT_HWCAP2);
+-#else
+-    unsigned long hw_cap = 0;
+-    unsigned long hw_cap2 = 0;
+-    elf_aux_info(AT_HWCAP, &hw_cap, sizeof(hw_cap));
+-    elf_aux_info(AT_HWCAP2, &hw_cap2, sizeof(hw_cap2));
+-#endif
++    unsigned long hw_cap = dav1d_getauxval(AT_HWCAP);
++    unsigned long hw_cap2 = dav1d_getauxval(AT_HWCAP2);
+ 
+     unsigned flags = dav1d_get_default_cpu_flags();
+     flags |= (hw_cap & HWCAP_AARCH64_ASIMDDP) ? DAV1D_ARM_CPU_FLAG_DOTPROD : 0;
+@@ -69,12 +62,7 @@ COLD unsigned dav1d_get_cpu_flags_arm(void) {
+ #define HWCAP_ARM_I8MM    (1 << 27)
+ 
+ COLD unsigned dav1d_get_cpu_flags_arm(void) {
+-#if HAVE_GETAUXVAL
+-    unsigned long hw_cap = getauxval(AT_HWCAP);
+-#else
+-    unsigned long hw_cap = 0;
+-    elf_aux_info(AT_HWCAP, &hw_cap, sizeof(hw_cap));
+-#endif
++    unsigned long hw_cap = dav1d_getauxval(AT_HWCAP);
+ 
+     unsigned flags = dav1d_get_default_cpu_flags();
+     flags |= (hw_cap & HWCAP_ARM_NEON) ? DAV1D_ARM_CPU_FLAG_NEON : 0;
Index: patches/patch-src_cpu_c
===================================================================
RCS file: patches/patch-src_cpu_c
diff -N patches/patch-src_cpu_c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_cpu_c	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,37 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/cpu.c
+--- src/cpu.c.orig
++++ src/cpu.c
+@@ -52,6 +52,10 @@
+ #endif
+ #endif
+ 
++#if HAVE_GETAUXVAL || HAVE_ELF_AUX_INFO
++#include <sys/auxv.h>
++#endif
++
+ unsigned dav1d_cpu_flags = 0U;
+ unsigned dav1d_cpu_flags_mask = ~0U;
+ 
+@@ -106,4 +110,19 @@ COLD int dav1d_num_logical_processors(Dav1dContext *co
+     if (c)
+         dav1d_log(c, "Unable to detect thread count, defaulting to single-threaded mode\n");
+     return 1;
++}
++
++COLD unsigned long dav1d_getauxval(unsigned long type) {
++#if HAVE_GETAUXVAL
++    return getauxval(type);
++#elif HAVE_ELF_AUX_INFO
++    unsigned long aux = 0;
++    int ret = elf_aux_info(type, &aux, sizeof(aux));
++    if (ret != 0)
++        errno = ret;
++    return aux;
++#else
++    errno = ENOSYS;
++    return 0;
++#endif
+ }
Index: patches/patch-src_cpu_h
===================================================================
RCS file: patches/patch-src_cpu_h
diff -N patches/patch-src_cpu_h
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_cpu_h	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,14 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/cpu.h
+--- src/cpu.h.orig
++++ src/cpu.h
+@@ -53,6 +53,7 @@ EXTERN unsigned dav1d_cpu_flags_mask;
+ void dav1d_init_cpu(void);
+ DAV1D_API void dav1d_set_cpu_flags_mask(unsigned mask);
+ int dav1d_num_logical_processors(Dav1dContext *c);
++unsigned long dav1d_getauxval(unsigned long);
+ 
+ static ALWAYS_INLINE unsigned dav1d_get_default_cpu_flags(void) {
+     unsigned flags = 0;
Index: patches/patch-src_decode_c
===================================================================
RCS file: patches/patch-src_decode_c
diff -N patches/patch-src_decode_c
--- patches/patch-src_decode_c	29 Feb 2024 14:33:39 -0000	1.3
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,40 +0,0 @@
-Fix tile_start_off calculations for extremely large frame sizes
-
-2b475307dc11be9a1c3cc4358102c76a7f386a51
-
-CVE-2024-1580
-
-Index: src/decode.c
---- src/decode.c.orig
-+++ src/decode.c
-@@ -2630,7 +2630,7 @@ static void setup_tile(Dav1dTileState *const ts,
-                        const Dav1dFrameContext *const f,
-                        const uint8_t *const data, const size_t sz,
-                        const int tile_row, const int tile_col,
--                       const int tile_start_off)
-+                       const unsigned tile_start_off)
- {
-     const int col_sb_start = f->frame_hdr->tiling.col_start_sb[tile_col];
-     const int col_sb128_start = col_sb_start >> !f->seq_hdr->sb128;
-@@ -2981,15 +2981,16 @@ int dav1d_decode_frame_init(Dav1dFrameContext *const f
-     const uint8_t *const size_mul = ss_size_mul[f->cur.p.layout];
-     const int hbd = !!f->seq_hdr->hbd;
-     if (c->n_fc > 1) {
-+        const unsigned sb_step4 = f->sb_step * 4;
-         int tile_idx = 0;
-         for (int tile_row = 0; tile_row < f->frame_hdr->tiling.rows; tile_row++) {
--            int row_off = f->frame_hdr->tiling.row_start_sb[tile_row] *
--                          f->sb_step * 4 * f->sb128w * 128;
--            int b_diff = (f->frame_hdr->tiling.row_start_sb[tile_row + 1] -
--                          f->frame_hdr->tiling.row_start_sb[tile_row]) * f->sb_step * 4;
-+            const unsigned row_off = f->frame_hdr->tiling.row_start_sb[tile_row] *
-+                                     sb_step4 * f->sb128w * 128;
-+            const unsigned b_diff = (f->frame_hdr->tiling.row_start_sb[tile_row + 1] -
-+                                     f->frame_hdr->tiling.row_start_sb[tile_row]) * sb_step4;
-             for (int tile_col = 0; tile_col < f->frame_hdr->tiling.cols; tile_col++) {
-                 f->frame_thread.tile_start_off[tile_idx++] = row_off + b_diff *
--                    f->frame_hdr->tiling.col_start_sb[tile_col] * f->sb_step * 4;
-+                    f->frame_hdr->tiling.col_start_sb[tile_col] * sb_step4;
-             }
-         }
- 
Index: patches/patch-src_ext_x86_x86inc_asm
===================================================================
RCS file: /cvs/ports/multimedia/dav1d/patches/patch-src_ext_x86_x86inc_asm,v
retrieving revision 1.1
diff -u -p -u -p -r1.1 patch-src_ext_x86_x86inc_asm
--- patches/patch-src_ext_x86_x86inc_asm	13 Jul 2023 12:36:36 -0000	1.1
+++ patches/patch-src_ext_x86_x86inc_asm	1 Dec 2024 08:45:12 -0000
@@ -14,7 +14,7 @@ Index: src/ext/x86/x86inc.asm
  %macro LEA 2
  %if ARCH_X86_64
      lea %1, [%2]
-@@ -795,6 +801,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
+@@ -839,6 +845,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
      %endif
      align function_align
      %2:
@@ -22,7 +22,7 @@ Index: src/ext/x86/x86inc.asm
      RESET_MM_PERMUTATION        ; needed for x86-64, also makes disassembly somewhat nicer
      %xdefine rstk rsp           ; copy of the original stack pointer, used when greater alignment than the known stack alignment is required
      %assign stack_offset 0      ; stack pointer offset relative to the return address
-@@ -816,6 +823,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
+@@ -860,6 +867,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
          global current_function %+ %1
      %endif
      %1:
Index: patches/patch-src_internal_h
===================================================================
RCS file: patches/patch-src_internal_h
diff -N patches/patch-src_internal_h
--- patches/patch-src_internal_h	29 Feb 2024 14:33:39 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,18 +0,0 @@
-Fix tile_start_off calculations for extremely large frame sizes
-
-2b475307dc11be9a1c3cc4358102c76a7f386a51
-
-CVE-2024-1580
-
-Index: src/internal.h
---- src/internal.h.orig
-+++ src/internal.h
-@@ -286,7 +286,7 @@ struct Dav1dFrameContext {
-         int prog_sz;
-         int pal_sz, pal_idx_sz, cf_sz;
-         // start offsets per tile
--        int *tile_start_off;
-+        unsigned *tile_start_off;
-     } frame_thread;
- 
-     // loopfilter
Index: patches/patch-src_loongarch_cpu_c
===================================================================
RCS file: patches/patch-src_loongarch_cpu_c
diff -N patches/patch-src_loongarch_cpu_c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_loongarch_cpu_c	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,15 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/loongarch/cpu.c
+--- src/loongarch/cpu.c.orig
++++ src/loongarch/cpu.c
+@@ -40,7 +40,7 @@
+ COLD unsigned dav1d_get_cpu_flags_loongarch(void) {
+     unsigned flags = dav1d_get_default_cpu_flags();
+ #if HAVE_GETAUXVAL
+-    unsigned long hw_cap = getauxval(AT_HWCAP);
++    unsigned long hw_cap = dav1d_getauxval(AT_HWCAP);
+     flags |= (hw_cap & LA_HWCAP_LSX) ? DAV1D_LOONGARCH_CPU_FLAG_LSX : 0;
+     flags |= (hw_cap & LA_HWCAP_LASX) ? DAV1D_LOONGARCH_CPU_FLAG_LASX : 0;
+ #endif
Index: patches/patch-src_ppc_cpu_c
===================================================================
RCS file: patches/patch-src_ppc_cpu_c
diff -N patches/patch-src_ppc_cpu_c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_ppc_cpu_c	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,25 @@
+Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+
+Index: src/ppc/cpu.c
+--- src/ppc/cpu.c.orig
++++ src/ppc/cpu.c
+@@ -39,16 +39,9 @@
+ 
+ COLD unsigned dav1d_get_cpu_flags_ppc(void) {
+     unsigned flags = dav1d_get_default_cpu_flags();
+-#if HAVE_GETAUXVAL && ARCH_PPC64LE
+-    unsigned long hw_cap = getauxval(AT_HWCAP);
+-    unsigned long hw_cap2 = getauxval(AT_HWCAP2);
+-#elif HAVE_ELF_AUX_INFO && ARCH_PPC64LE
+-    unsigned long hw_cap = 0;
+-    unsigned long hw_cap2 = 0;
+-    elf_aux_info(AT_HWCAP, &hw_cap, sizeof(hw_cap));
+-    elf_aux_info(AT_HWCAP2, &hw_cap2, sizeof(hw_cap2));
+-#endif
+ #if HAVE_AUX
++    unsigned long hw_cap = dav1d_getauxval(AT_HWCAP);
++    unsigned long hw_cap2 = dav1d_getauxval(AT_HWCAP2);
+     flags |= (hw_cap & PPC_FEATURE_HAS_VSX) ? DAV1D_PPC_CPU_FLAG_VSX : 0;
+     flags |= (hw_cap2 & PPC_FEATURE2_ARCH_3_00) ? DAV1D_PPC_CPU_FLAG_PWR9 : 0;
+ #endif
Index: patches/patch-src_riscv_cpu_c
===================================================================
RCS file: patches/patch-src_riscv_cpu_c
diff -N patches/patch-src_riscv_cpu_c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_riscv_cpu_c	1 Dec 2024 08:45:12 -0000
@@ -0,0 +1,31 @@
+- Provide dav1d_getauxval() wrapper for getauxvaul() and elf_aux_info()
+  93f12c117a4e1c0cc2b129dcc52e84dbd9b84200
+- riscv: Enable FreeBSD / OpenBSD elf_aux_info() support
+  f15666b7031fa6b50f0db516d78e966acd18f5ae 
+
+Index: src/riscv/cpu.c
+--- src/riscv/cpu.c.orig
++++ src/riscv/cpu.c
+@@ -32,19 +32,17 @@
+ #include "src/cpu.h"
+ #include "src/riscv/cpu.h"
+ 
+-#if HAVE_GETAUXVAL
++#if HAVE_GETAUXVAL || HAVE_ELF_AUX_INFO
+ #include <sys/auxv.h>
+-
+ #define HWCAP_RVV (1 << ('v' - 'a'))
+-
+ #endif
+ 
+ int dav1d_has_compliant_rvv(void);
+ 
+ COLD unsigned dav1d_get_cpu_flags_riscv(void) {
+     unsigned flags = dav1d_get_default_cpu_flags();
+-#if HAVE_GETAUXVAL
+-    unsigned long hw_cap = getauxval(AT_HWCAP);
++#if HAVE_GETAUXVAL || HAVE_ELF_AUX_INFO
++    unsigned long hw_cap = dav1d_getauxval(AT_HWCAP);
+     flags |= (hw_cap & HWCAP_RVV) && dav1d_has_compliant_rvv() ? DAV1D_RISCV_CPU_FLAG_V : 0;
+ #endif
+
2024-12-01 08:51 Brad Smith:
UPDATE: dav1d 1.5.0
- 2024-12-02 12:01 Stuart Henderson:
  UPDATE: dav1d 1.5.0
- - 2024-12-03 12:18 Landry Breuil:
    UPDATE: dav1d 1.5.0