(address . bug-guix@gnu.org)
Hello!
On AMD EPYC processors, as found on the build nodes of ci.guix.gnu.org,
childhurd VMs fail to boot when running with ‘qemu-system-i386
-enable-kvm’ (the kvm-amd Linux kernel module is used), with the Hurd
startup process hanging before /hurd/exec has been started:
Toggle snippet (18 lines)
module 0: ext2fs --multiboot-command-line=${kernel-command-line} --host-priv-por
t=${host-port} --device-master-port=${device-port} --exec-server-task=${exec-tas
k} --store-type=typed --x-xattr-translator-records ${root} $(task-create) $(task
-resume)
module 1: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167/hu
rd/exec $(exec-task=task-create)
2 multiboot modules
task loaded: ext2fs --multiboot-command-line=root=device:hd0s1 root=3367134b-cfb
d-1e90-2f38-dfd13367134b gnu.system=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn-
system gnu.load=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn-system/boot --host-p
riv-port=1 --device-master-port=2 --exec-server-task=3 --store-type=typed --x-xa
ttr-translator-records device:hd0s1
task loaded: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167
/hurd/exec
start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec
Kdb shows these two tasks:
Toggle snippet (28 lines)
Stopped at mach
ine_idle+0x26: nop
machine_idle(0,c102f380,f5f7bcd0,c102e3fa)+0x26
idle_thread_continue(f5f7ade0,36366000,f5f73868,f5f73890,0)+0x73
>>>>> user space <<<<<
db> show all threads
TASK THREADS
0 gnumach (f5f7cf00): 7 threads:
0 (f5f7be18) .W..N. 0xc11dac04
1 (f5f7bcd0) R.....
2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4
3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c
4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0
5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74
6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8
1 ext2fs (f5f7ce40): 6 threads:
0 (f5f7b520) .W.O.F(mach_msg_continue) 0
1 (f5f7b290) .W.O.F(mach_msg_receive_continue) 0
2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0
3 (f5f7b000) .W.O..(mach_msg_continue) 0
4 (f67d3e20) .W.O..(mach_msg_receive_continue) 0
5 (f67d3cd8) .W.O..(mach_msg_continue) 0
db> trace/t 0xf5f7b520
Continuation mach_msg_continue
>>>>> user space <<<<<
0x80ccaec()
For ext2fs.static, that just means thread 0 is here:
Toggle snippet (4 lines)
$ addr2line -e /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167/hurd/ext2fs.static 0x80ccaec
/tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/mach/mach_msg_trap.S:2
That doesn’t tell us much.
The same image boots fine on the same CPU without ‘-enable-kvm’.
However, keeping ‘-enable-kvm’ and adding ‘--cpu pentium’ and other
variants of this option don’t make any difference, AFAICS.
Ideas on how to debug this further, and/or ways to work around it
without giving up on KVM?
Thanks,
Ludo’.