Download raw body.
-current Haskell ports aborting with SIGILL
On Fri, Apr 19, 2024 at 12:04:54AM +0000, James Cook wrote:
> On Sun, Feb 18, 2024 at 08:35:26AM -0800, Evan Silberman wrote:
> > Stuart Henderson <stu@spacehopper.org> wrote:
> > > On 2024/02/18 09:02, Stuart Henderson wrote:
> > > > On 2024/02/17 22:08, Greg Steuck wrote:
> > > > > Oh wow, this is becoming eerily similar to the failures aja@ is getting. Do
> > > > > dig more into this!
> > > >
> > > > Antoine, can you send a dmesg from one of the exopi VMs, please?
> > >
> > > - specifically I am wondering if it could be someething to do with AVX,
> > > with AVX512 being the most likely to cause problems - Evan's machine
> > > does have this - my intel 11th gen doesn't because it has a mix of
> > > P+E cores, E cores don't implement it, so they disable it on the P
> > > cores too.
> > >
> > > ghc *does* have some code relating to AVX512.
> >
> > Breakthrough, ignore the previous reproducer and any association with
> > template haskell. I can get a crash in GHCI very simply:
> >
> > ~ $ ghci
> > GHCi, version 9.6.4: https://www.haskell.org/ghc/ :? for help
> > ghci> import qualified Data.Text as T
> > ghci> T.take 1 $ T.pack "aa"
> > "Illegal instruction (core dumped)
>
> I'm seeing this on my OpenBSD 7.5 machine with an old AMD cpu (dmesg
> follows signature). I have ghc-9.6.4p2 (which I think is after the
> fix on this thread). Also pandoc doesn't work.
>
> $ pkg_info -I ghc pandoc
> ghc-9.6.4p2 compiler for the functional language Haskell
> pandoc-3.1.12.2 convert between markup and document formats
> $ echo | pandoc -o tmp/test.html
> Illegal instruction (core dumped)
> $ ghci
> GHCi, version 9.6.4: https://www.haskell.org/ghc/ :? for help
> ghci> import qualified Data.Text as T
> ghci> T.take 1 $ T.pack "aa"
> "Illegal instruction (core dumped)
> $
>
> I also saw ghc die with a SIGILL when I tried to build pandoc from
> ports (checked out from cvs with -rOPENBSD_7_5). It happened when
> cabal was trying to build unicode-collation-0.1.3.6.
Here are some results of debugging with lldb.
With cabal-bundler and pandoc, it seems to be the xgetbv instruction
itself:
$ lldb /usr/local/bin/cabal-bundler
(lldb) target create "/usr/local/bin/cabal-bundler"
Current executable set to '/usr/local/bin/cabal-bundler' (x86_64).
(lldb) run
Process 90738 launched: '/usr/local/bin/cabal-bundler' (x86_64)
Process 90738 stopped
* thread #1, stop reason = signal SIGILL
frame #0: 0x00000000004c12ba cabal-bundler`___lldb_unnamed_symbol522 + 90
cabal-bundler`___lldb_unnamed_symbol522:
-> 0x4c12ba <+90>: xgetbv
0x4c12bd <+93>: notl %eax
0x4c12bf <+95>: testb $-0x20, %al
0x4c12c1 <+97>: leaq 0x58(%rip), %rcx ; ___lldb_unnamed_symbol523
$ lldb pandoc /dev/null
(lldb) target create "pandoc"
Current executable set to '/usr/local/bin/pandoc' (x86_64).
(lldb) settings set -- target.run-args "/dev/null"
(lldb) run
Process 25189 launched: '/usr/local/bin/pandoc' (x86_64)
[WARNING] Could not deduce format from file extension
Defaulting to markdown
Process 25189 stopped
* thread #1, stop reason = signal SIGILL
frame #0: 0x00000000057697fa pandoc`___lldb_unnamed_symbol1367 + 90
pandoc`___lldb_unnamed_symbol1367:
-> 0x57697fa <+90>: xgetbv
0x57697fd <+93>: notl %eax
0x57697ff <+95>: testb $-0x20, %al
0x5769801 <+97>: leaq 0x58(%rip), %rcx ; ___lldb_unnamed_symbol1368
(lldb)
I tried doing the same for Evan's ghci example, but lldb did not
automatically print assembly output as it did for pandoc. I don't
really know how to use lldb. I tried the "di" command but I don't
know if it's doing the right thing:
$ lldb /usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4 -- --interactive
(lldb) target create "/usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4"
Current executable set to '/usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4' (x86_64).
(lldb) settings set -- target.run-args "--interactive"
(lldb) run
Process 14800 launched: '/usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4' (x86_64)
GHCi, version 9.6.4: https://www.haskell.org/ghc/ :? for help
ghci> import qualified Data.Text as T
ghci> T.take 1 $ T.pack "aa"
"Process 14800 stopped
* thread #1, stop reason = signal SIGILL
frame #0: 0x00000002d2a1242b libc.so.99.0`_thread_sys_futex at -:2
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
(lldb) di
libc.so.99.0`_thread_sys_futex:
0x2d2a12410 <+0>: endbr64
0x2d2a12414 <+4>: movq 0x849ad(%rip), %r11 ; __retguard__thread_sys_futex
0x2d2a1241b <+11>: xorq (%rsp), %r11
0x2d2a1241f <+15>: pushq %r11
0x2d2a12421 <+17>: movl $0x53, %eax
0x2d2a12426 <+22>: movq %rcx, %r10
0x2d2a12429 <+25>: syscall
-> 0x2d2a1242b <+27>: jae 0x2d2a1243c ; <+44>
0x2d2a1242d <+29>: movl %eax, %fs:0x20
0x2d2a12435 <+37>: movq $-0x1, %rax
0x2d2a1243c <+44>: popq %r11
0x2d2a1243e <+46>: xorq (%rsp), %r11
0x2d2a12442 <+50>: cmpq 0x8497f(%rip), %r11 ; __retguard__thread_sys_futex
0x2d2a12449 <+57>: je 0x2d2a1245c ; <+76>
0x2d2a1244b <+59>: int3
0x2d2a1244c <+60>: int3
0x2d2a1244d <+61>: int3
0x2d2a1244e <+62>: int3
0x2d2a1244f <+63>: int3
0x2d2a12450 <+64>: int3
0x2d2a12451 <+65>: int3
0x2d2a12452 <+66>: int3
0x2d2a12453 <+67>: int3
0x2d2a12454 <+68>: int3
0x2d2a12455 <+69>: int3
0x2d2a12456 <+70>: int3
0x2d2a12457 <+71>: int3
0x2d2a12458 <+72>: int3
0x2d2a12459 <+73>: int3
0x2d2a1245a <+74>: int3
0x2d2a1245b <+75>: int3
0x2d2a1245c <+76>: retq
Similar result when I tried examining the core file under
/usr/ports/pobj/pandoc-3.1.12.2/unicode-collation-0.1.3.6:
$ doas -u _pbuild lldb -c ghc-9.6.4.core /usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4
(lldb) target create "/usr/local/lib/ghc-9.6.4/bin/ghc-9.6.4" --core "ghc-9.6.4.core"
Core file '/usr/ports/pobj/pandoc-3.1.12.2/unicode-collation-0.1.3.6/ghc-9.6.4.core' (x86_64) was loaded.
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
Could not load history file
.(lldb) di
libc.so.99.0`_thread_sys_futex:
0x2ac52e410 <+0>: endbr64
0x2ac52e414 <+4>: movq 0x849ad(%rip), %r11 ; __retguard__thread_sys_futex
0x2ac52e41b <+11>: xorq (%rsp), %r11
0x2ac52e41f <+15>: pushq %r11
0x2ac52e421 <+17>: movl $0x53, %eax
0x2ac52e426 <+22>: movq %rcx, %r10
0x2ac52e429 <+25>: syscall
-> 0x2ac52e42b <+27>: jae 0x5b43c ; <+44>
0x2ac52e42d <+29>: movl %eax, %fs:0x20
0x2ac52e435 <+37>: movq $-0x1, %rax
0x2ac52e43c <+44>: popq %r11
0x2ac52e43e <+46>: xorq (%rsp), %r11
0x2ac52e442 <+50>: cmpq 0x8497f(%rip), %r11 ; __retguard__thread_sys_futex
0x2ac52e449 <+57>: je 0x5b45c ; <+76>
0x2ac52e44b <+59>: int3
0x2ac52e44c <+60>: int3
0x2ac52e44d <+61>: int3
0x2ac52e44e <+62>: int3
0x2ac52e44f <+63>: int3
0x2ac52e450 <+64>: int3
0x2ac52e451 <+65>: int3
0x2ac52e452 <+66>: int3
0x2ac52e453 <+67>: int3
0x2ac52e454 <+68>: int3
0x2ac52e455 <+69>: int3
0x2ac52e456 <+70>: int3
0x2ac52e457 <+71>: int3
0x2ac52e458 <+72>: int3
0x2ac52e459 <+73>: int3
0x2ac52e45a <+74>: int3
0x2ac52e45b <+75>: int3
0x2ac52e45c <+76>: retq
(lldb)
--
James
-current Haskell ports aborting with SIGILL