'guix substitutes' sometimes hangs on glibc 2.27

Done

Details

4 participants

Ludovic Courtès
Mark H Weaver
Ricardo Wurmus
Andy Wingo

Owner: unassigned

Submitted by: Ludovic Courtès

Severity: serious

Debbugs page

Ludovic Courtès wrote on 21 Jun 2018 04:45

Recipients:(address . bug-guix@gnu.org)

Message-ID:87bmc4748n.fsf@gnu.org

Hello Guix!

When downloading a number of substitutes, ‘guix substitute’ sometimes

hangs for me since the switch to glibc 2.27. Anyone else experiencing

this? It’s relatively frequent for me.

The backtrace shows this:

Toggle snippet (47 lines)(gdb) bt
#0  0x00007fbb34bf794d in __GI___pthread_timedjoin_ex (threadid=140441961314048, thread_return=thread_return@entry=0x0, abstime=abstime@entry=0x0, block=block@entry=true)
    at pthread_join_common.c:89
#1  0x00007fbb34bf773c in __pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:24
#2  0x00007fbb350d7548 in stop_finalization_thread () at finalizers.c:265
#3  0x00007fbb350d7759 in scm_i_finalizer_pre_fork () at finalizers.c:290
#4  0x00007fbb3514f256 in scm_fork () at posix.c:1222
#5  0x00007fbb351477fd in vm_regular_engine (thread=0x7fbb313739d0, vp=0x1569f30, registers=0x52de, resume=884963661) at vm-engine.c:784
#6  0x00007fbb3514ae5a in scm_call_n (proc=<error reading variable: ERROR: Cannot access memory at address 0x355c2030>0x7fbb355c2030, argv=argv@entry=0x7fff856ae7f8, nargs=nargs@entry=1)
    at vm.c:1257
#7  0x00007fbb350ceff7 in scm_primitive_eval (exp=exp@entry=<error reading variable: ERROR: Cannot access memory at address 0x31738cf8>0x164fb00) at eval.c:662
#8  0x00007fbb350cf053 in scm_eval (exp=<error reading variable: ERROR: Cannot access memory at address 0x31738cf8>0x164fb00, 
    module_or_state=module_or_state@entry=<error reading variable: ERROR: Cannot access memory at address 0x354750d8>0x162b140) at eval.c:696
#9  0x00007fbb3511a0b0 in scm_shell (argc=10, argv=0x15f0390) at script.c:454
#10 0x00007fbb350e5add in invoke_main_func (body_data=0x7fff856aed10) at init.c:340
#11 0x00007fbb350c82da in c_body (d=0x7fff856aec50) at continuations.c:422
#12 0x00007fbb351477fd in vm_regular_engine (thread=0x7fbb313739d0, vp=0x1569f30, registers=0x52de, resume=884963661) at vm-engine.c:784
#13 0x00007fbb3514ae5a in scm_call_n (proc=proc@entry=#<smob catch-closure 1564200>, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1257
#14 0x00007fbb350cdef9 in scm_call_0 (proc=proc@entry=#<smob catch-closure 1564200>) at eval.c:481
#15 0x00007fbb3513a026 in catch (tag=tag@entry=#t, thunk=#<smob catch-closure 1564200>, handler=<error reading variable: ERROR: Cannot access memory at address 0x0>0x15641e0, 
    pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x0>0x15641c0) at throw.c:137
#16 0x00007fbb3513a365 in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>) at throw.c:254
#17 0x00007fbb3513a51f in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7fbb350c82d0 <c_body>, body_data=body_data@entry=0x7fff856aec50, handler=handler@entry=0x7fbb350c8560 <c_handler>, 
    handler_data=handler_data@entry=0x7fff856aec50, pre_unwind_handler=pre_unwind_handler@entry=0x7fbb350c83c0 <pre_unwind_handler>, pre_unwind_handler_data=0x1564b60) at throw.c:377
#18 0x00007fbb350c88c0 in scm_i_with_continuation_barrier (body=body@entry=0x7fbb350c82d0 <c_body>, body_data=body_data@entry=0x7fff856aec50, handler=handler@entry=0x7fbb350c8560 <c_handler>, 
    handler_data=handler_data@entry=0x7fff856aec50, pre_unwind_handler=pre_unwind_handler@entry=0x7fbb350c83c0 <pre_unwind_handler>, pre_unwind_handler_data=0x1564b60) at continuations.c:360
#19 0x00007fbb350c8955 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
#20 0x00007fbb35138c3c in with_guile (base=base@entry=0x7fff856aecb8, data=data@entry=0x7fff856aece0) at threads.c:661
#21 0x00007fbb34e2afb8 in GC_call_with_stack_base (fn=fn@entry=0x7fbb35138bf0 <with_guile>, arg=arg@entry=0x7fff856aece0) at misc.c:1949
#22 0x00007fbb35138fd8 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7fff856aece0, func=func@entry=0x7fbb350e5ac0 <invoke_main_func>) at threads.c:704
#23 scm_with_guile (func=func@entry=0x7fbb350e5ac0 <invoke_main_func>, data=data@entry=0x7fff856aed10) at threads.c:710
#24 0x00007fbb350e5c72 in scm_boot_guile (argc=argc@entry=7, argv=argv@entry=0x7fff856aee68, main_func=main_func@entry=0x400ce0 <inner_main>, closure=closure@entry=0x0) at init.c:323
#25 0x0000000000400b80 in main (argc=7, argv=0x7fff856aee68) at guile.c:101
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7fbb355cab80 (LWP 21207) "guix substitute" 0x00007fbb34bf794d in __GI___pthread_timedjoin_ex (threadid=140441961314048, thread_return=thread_return@entry=0x0, 
    abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:89
  2    Thread 0x7fbb3342c700 (LWP 21208) ".guix-real" 0x00007fbb34bfc552 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fbb3504f6ec <mark_cv+44>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7fbb32c2b700 (LWP 21209) ".guix-real" 0x00007fbb34bfc552 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fbb3504f6ec <mark_cv+44>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  4    Thread 0x7fbb3242a700 (LWP 21210) ".guix-real" 0x00007fbb34bfc552 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fbb3504f6ec <mark_cv+44>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  5    Thread 0x7fbb2eb78700 (LWP 21212) ".guix-real" 0x00007fbb34bffaac in __libc_read (fd=9, buf=buf@entry=0x7fbb2eb77540, nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:27
  6    Thread 0x7fbb31373700 (LWP 21214) "guix substitute" 0x00007fbb34bffaac in __libc_read (fd=5, buf=buf@entry=0x7fbb31372a30, nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:27

The finalization thread is itself stuck reading from its pipe:

Toggle snippet (32 lines)(gdb) thread 6
[Switching to thread 6 (Thread 0x7fbb31373700 (LWP 21214))]
#0  0x00007fbb34bffaac in __libc_read (fd=5, buf=buf@entry=0x7fbb31372a30, nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:27
27      in ../sysdeps/unix/sysv/linux/read.c
(gdb) bt
#0  0x00007fbb34bffaac in __libc_read (fd=5, buf=buf@entry=0x7fbb31372a30, nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x00007fbb350d74d7 in read_finalization_pipe_data (data=0x7fbb31372a30) at finalizers.c:199
#2  0x00007fbb34e30b63 in GC_do_blocking_inner (data=0x7fbb313729f0 "\300t\r5\273\177", context=<optimized out>) at pthread_support.c:1353
#3  0x00007fbb34e25389 in GC_with_callee_saves_pushed (fn=0x7fbb34e30b20 <GC_do_blocking_inner>, arg=arg@entry=0x7fbb313729f0 "\300t\r5\273\177") at mach_dep.c:322
#4  0x00007fbb34e2afec in GC_do_blocking (fn=fn@entry=0x7fbb350d74c0 <read_finalization_pipe_data>, client_data=client_data@entry=0x7fbb31372a30) at misc.c:2061
#5  0x00007fbb3513902a in scm_without_guile (func=0x7fbb350d74c0 <read_finalization_pipe_data>, data=0x7fbb31372a30) at threads.c:722
#6  0x00007fbb350d7887 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:212
#7  0x00007fbb350c82da in c_body (d=0x7fbb31372e50) at continuations.c:422
#8  0x00007fbb351477fd in vm_regular_engine (thread=0x5, vp=0x172aea0, registers=0x1, resume=884996780) at vm-engine.c:784
#9  0x00007fbb3514ae5a in scm_call_n (proc=proc@entry=#<smob catch-closure 1c1ff60>, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1257
#10 0x00007fbb350cdef9 in scm_call_0 (proc=proc@entry=#<smob catch-closure 1c1ff60>) at eval.c:481
#11 0x00007fbb3513a026 in catch (tag=tag@entry=#t, thunk=#<smob catch-closure 1c1ff60>, handler=<error reading variable: ERROR: Cannot access memory at address 0x0>0x1c1ff40, 
    pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x0>0x1c1ff00) at throw.c:137
#12 0x00007fbb3513a365 in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>) at throw.c:254
#13 0x00007fbb3513a51f in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7fbb350c82d0 <c_body>, body_data=body_data@entry=0x7fbb31372e50, handler=handler@entry=0x7fbb350c8560 <c_handler>, 
    handler_data=handler_data@entry=0x7fbb31372e50, pre_unwind_handler=pre_unwind_handler@entry=0x7fbb350c83c0 <pre_unwind_handler>, pre_unwind_handler_data=0x1564b60) at throw.c:377
#14 0x00007fbb350c88c0 in scm_i_with_continuation_barrier (body=body@entry=0x7fbb350c82d0 <c_body>, body_data=body_data@entry=0x7fbb31372e50, handler=handler@entry=0x7fbb350c8560 <c_handler>, 
    handler_data=handler_data@entry=0x7fbb31372e50, pre_unwind_handler=pre_unwind_handler@entry=0x7fbb350c83c0 <pre_unwind_handler>, pre_unwind_handler_data=0x1564b60) at continuations.c:360
#15 0x00007fbb350c8955 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
#16 0x00007fbb35138c3c in with_guile (base=base@entry=0x7fbb31372eb8, data=data@entry=0x7fbb31372ee0) at threads.c:661
#17 0x00007fbb34e2afb8 in GC_call_with_stack_base (fn=fn@entry=0x7fbb35138bf0 <with_guile>, arg=arg@entry=0x7fbb31372ee0) at misc.c:1949
#18 0x00007fbb35138fd8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>) at threads.c:704
#19 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:710
#20 0x00007fbb34bf6567 in start_thread (arg=0x7fbb31373700) at pthread_create.c:463
#21 0x00007fbb3351eeaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The ‘primitive-fork’ call itself presumably comes from

‘decompressed-port’, used in (guix scripts substitute).

Ludo’.

Ricardo Wurmus wrote on 21 Jun 2018 07:10

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 31925@debbugs.gnu.org)

Message-ID:87tvpws00q.fsf@elephly.net

Hi Ludo,

Toggle quote (4 lines)

> When downloading a number of substitutes, ‘guix substitute’ sometimes

> hangs for me since the switch to glibc 2.27. Anyone else experiencing

> this? It’s relatively frequent for me.

It has never happened to me.

Ricardo

Ludovic Courtès wrote on 25 Jun 2018 05:25

control message for bug #31925

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87wounqchx.fsf@gnu.org

severity 31925 serious

Ludovic Courtès wrote on 4 Jul 2018 00:03

Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27

Recipients:(address . 31925@debbugs.gnu.org)

Message-ID:874lhffpnq.fsf@gnu.org

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (7 lines)

> #0 0x00007fbb34bf794d in __GI___pthread_timedjoin_ex (threadid=140441961314048, thread_return=thread_return@entry=0x0, abstime=abstime@entry=0x0, block=block@entry=true)

> at pthread_join_common.c:89

> #1 0x00007fbb34bf773c in __pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:24

> #2 0x00007fbb350d7548 in stop_finalization_thread () at finalizers.c:265

> #3 0x00007fbb350d7759 in scm_i_finalizer_pre_fork () at finalizers.c:290

> #4 0x00007fbb3514f256 in scm_fork () at posix.c:1222

Here’s a reproducer that works quite well (it hangs within a couple of

minutes):

(use-modules (guix utils) (ice-9 ftw) (ice-9 match) (srfi srfi-1) (srfi srfi-26) (rnrs io ports)) (define infodir (string-append (getenv "HOME") "/.guix-profile/share/info/")) (define files (apply circular-list (map (cut string-append infodir <>) (scandir infodir (lambda (file) (string-suffix? ".gz" file)))))) (sigaction SIGALRM (lambda _ (alarm 1))) (alarm 1) (let loop ((files files) (n 0)) (match files ((file . tail) (call-with-input-file file (lambda (port) (call-with-decompressed-port 'gzip port (lambda (port) (let loop () (unless (eof-object? (get-bytevector-n port 777)) (loop))))))) ;; (pk 'loop n file) (display ".") (loop tail (+ n 1)))))

Ludo’.

Ludovic Courtès wrote on 4 Jul 2018 09:58

Recipients:

Message-ID:87tvpfaqfd.fsf@gnu.org

(+Cc: Andy as the ultimate authority for all these things. :-))

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (15 lines)> (let loop ((files files)
>            (n 0))
>   (match files
>     ((file . tail)
>      (call-with-input-file file
>        (lambda (port)
>          (call-with-decompressed-port 'gzip port
>            (lambda (port)
>              (let loop ()
>                (unless (eof-object? (get-bytevector-n port 777))
>                  (loop)))))))
>      ;; (pk 'loop n file)
>      (display ".")
>      (loop tail (+ n 1)))))

One problem I’ve noticed is that the child process that

‘call-with-decompressed-port’ spawns would be stuck trying to get the

allocation lock:

Toggle snippet (35 lines)(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=0x7f9fd91b3240 <GC_allocate_ml>) at ../nptl/pthread_mutex_lock.c:78
#2  0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (fn=fn@entry=0x7f9fd92b0420 <do_copy_weak_entry>, client_data=client_data@entry=0x7ffe4b9a0d80) at misc.c:1929
#3  0x00007f9fd92b1270 in copy_weak_entry (dst=0x7ffe4b9a0d70, src=0x759ed0) at weak-set.c:124
#4  weak_set_remove_x (closure=0x8850c0, pred=0x7f9fd92b0440 <eq_predicate>, hash=3944337866184184181, set=0x70cf00) at weak-set.c:615
#5  scm_c_weak_set_remove_x (set=set@entry=#<weak-set 756df0>, raw_hash=<optimized out>, pred=pred@entry=0x7f9fd92b0440 <eq_predicate>, closure=closure@entry=0x8850c0) at weak-set.c:791
#6  0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=#<weak-set 756df0>, obj=obj@entry=#<port 2 8850c0>) at weak-set.c:812
#7  0x00007f9fd926f72f in close_port (port=#<port 2 8850c0>, explicit=<optimized out>) at ports.c:884
#8  0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786
#9  0x00007f9fd92afb37 in scm_call_n (proc=<error reading variable: ERROR: Cannot access memory at address 0xd959b030>0x7f9fd959b030, argv=argv@entry=0x7ffe4b9a1018, nargs=nargs@entry=1) at vm.c:1257
#10 0x00007f9fd9233017 in scm_primitive_eval (exp=<optimized out>, exp@entry=<error reading variable: ERROR: Cannot access memory at address 0xd5677cf8>0x855280) at eval.c:662
#11 0x00007f9fd9233073 in scm_eval (exp=<error reading variable: ERROR: Cannot access memory at address 0xd5677cf8>0x855280, module_or_state=module_or_state@entry=<error reading variable: ERROR: Cannot access memory at address 0xd95580d8>0x83d140) at eval.c:696
#12 0x00007f9fd927e8d0 in scm_shell (argc=2, argv=0x7ffe4b9a1668) at script.c:454
#13 0x00007f9fd9249a9d in invoke_main_func (body_data=0x7ffe4b9a1510) at init.c:340
#14 0x00007f9fd922c28a in c_body (d=0x7ffe4b9a1450) at continuations.c:422
#15 0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786
#16 0x00007f9fd92afb37 in scm_call_n (proc=proc@entry=#<smob catch-closure 795120>, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1257
#17 0x00007f9fd9231e69 in scm_call_0 (proc=proc@entry=#<smob catch-closure 795120>) at eval.c:481
#18 0x00007f9fd929e7b2 in catch (tag=tag@entry=#t, thunk=#<smob catch-closure 795120>, handler=<error reading variable: ERROR: Cannot access memory at address 0x400000000>0x7950c0, pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x400000000>0x7950a0) at throw.c:137
#19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>) at throw.c:254
#20 0x00007f9fd929ec5f in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f9fd922c280 <c_body>, body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 <c_handler>, handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at throw.c:377
#21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (body=body@entry=0x7f9fd922c280 <c_body>, body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 <c_handler>, handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at continuations.c:360
#22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
#23 0x00007f9fd929d3ec in with_guile (base=base@entry=0x7ffe4b9a14b8, data=data@entry=0x7ffe4b9a14e0) at threads.c:661
#24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (fn=fn@entry=0x7f9fd929d3a0 <with_guile>, arg=arg@entry=0x7ffe4b9a14e0) at misc.c:1949
#25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7ffe4b9a14e0, func=func@entry=0x7f9fd9249a80 <invoke_main_func>) at threads.c:704
#26 scm_with_guile (func=func@entry=0x7f9fd9249a80 <invoke_main_func>, data=data@entry=0x7ffe4b9a1510) at threads.c:710
#27 0x00007f9fd9249c32 in scm_boot_guile (argc=argc@entry=2, argv=argv@entry=0x7ffe4b9a1668, main_func=main_func@entry=0x400cb0 <inner_main>, closure=closure@entry=0x0) at init.c:323
#28 0x0000000000400b70 in main (argc=2, argv=0x7ffe4b9a1668) at guile.c:101
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

So it seems quite clear that the thing has the alloc lock taken. I

suppose this can happen if one of the libgc threads runs right when we

call fork and takes the alloc lock, right?

If that is correct, the fix would be to call fork within

‘GC_call_with_alloc_lock’.

How does that sound?

As a workaround on the Guix side, we might achieve the same effect by

calling ‘gc-disable’ right before ‘primitive-fork’.

Ludo’.

Mark H Weaver wrote on 4 Jul 2018 20:33

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87efgil5jz.fsf@netris.org

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (61 lines)> (+Cc: Andy as the ultimate authority for all these things.  :-))
>
> ludo@gnu.org (Ludovic Courtès) skribis:
>
>> (let loop ((files files)
>>            (n 0))
>>   (match files
>>     ((file . tail)
>>      (call-with-input-file file
>>        (lambda (port)
>>          (call-with-decompressed-port 'gzip port
>>            (lambda (port)
>>              (let loop ()
>>                (unless (eof-object? (get-bytevector-n port 777))
>>                  (loop)))))))
>>      ;; (pk 'loop n file)
>>      (display ".")
>>      (loop tail (+ n 1)))))
>
> One problem I’ve noticed is that the child process that
> ‘call-with-decompressed-port’ spawns would be stuck trying to get the
> allocation lock:
>
> (gdb) bt
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=0x7f9fd91b3240 <GC_allocate_ml>) at ../nptl/pthread_mutex_lock.c:78
> #2  0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (fn=fn@entry=0x7f9fd92b0420 <do_copy_weak_entry>, client_data=client_data@entry=0x7ffe4b9a0d80) at misc.c:1929
> #3  0x00007f9fd92b1270 in copy_weak_entry (dst=0x7ffe4b9a0d70, src=0x759ed0) at weak-set.c:124
> #4  weak_set_remove_x (closure=0x8850c0, pred=0x7f9fd92b0440 <eq_predicate>, hash=3944337866184184181, set=0x70cf00) at weak-set.c:615
> #5  scm_c_weak_set_remove_x (set=set@entry=#<weak-set 756df0>, raw_hash=<optimized out>, pred=pred@entry=0x7f9fd92b0440 <eq_predicate>, closure=closure@entry=0x8850c0) at weak-set.c:791
> #6  0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=#<weak-set 756df0>, obj=obj@entry=#<port 2 8850c0>) at weak-set.c:812
> #7  0x00007f9fd926f72f in close_port (port=#<port 2 8850c0>, explicit=<optimized out>) at ports.c:884
> #8  0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786
> #9  0x00007f9fd92afb37 in scm_call_n (proc=<error reading variable: ERROR: Cannot access memory at address 0xd959b030>0x7f9fd959b030, argv=argv@entry=0x7ffe4b9a1018, nargs=nargs@entry=1) at vm.c:1257
> #10 0x00007f9fd9233017 in scm_primitive_eval (exp=<optimized out>, exp@entry=<error reading variable: ERROR: Cannot access memory at address 0xd5677cf8>0x855280) at eval.c:662
> #11 0x00007f9fd9233073 in scm_eval (exp=<error reading variable: ERROR: Cannot access memory at address 0xd5677cf8>0x855280, module_or_state=module_or_state@entry=<error reading variable: ERROR: Cannot access memory at address 0xd95580d8>0x83d140) at eval.c:696
> #12 0x00007f9fd927e8d0 in scm_shell (argc=2, argv=0x7ffe4b9a1668) at script.c:454
> #13 0x00007f9fd9249a9d in invoke_main_func (body_data=0x7ffe4b9a1510) at init.c:340
> #14 0x00007f9fd922c28a in c_body (d=0x7ffe4b9a1450) at continuations.c:422
> #15 0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786
> #16 0x00007f9fd92afb37 in scm_call_n (proc=proc@entry=#<smob catch-closure 795120>, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1257
> #17 0x00007f9fd9231e69 in scm_call_0 (proc=proc@entry=#<smob catch-closure 795120>) at eval.c:481
> #18 0x00007f9fd929e7b2 in catch (tag=tag@entry=#t, thunk=#<smob catch-closure 795120>, handler=<error reading variable: ERROR: Cannot access memory at address 0x400000000>0x7950c0, pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x400000000>0x7950a0) at throw.c:137
> #19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>) at throw.c:254
> #20 0x00007f9fd929ec5f in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f9fd922c280 <c_body>, body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 <c_handler>, handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at throw.c:377
> #21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (body=body@entry=0x7f9fd922c280 <c_body>, body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 <c_handler>, handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at continuations.c:360
> #22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
> #23 0x00007f9fd929d3ec in with_guile (base=base@entry=0x7ffe4b9a14b8, data=data@entry=0x7ffe4b9a14e0) at threads.c:661
> #24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (fn=fn@entry=0x7f9fd929d3a0 <with_guile>, arg=arg@entry=0x7ffe4b9a14e0) at misc.c:1949
> #25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7ffe4b9a14e0, func=func@entry=0x7f9fd9249a80 <invoke_main_func>) at threads.c:704
> #26 scm_with_guile (func=func@entry=0x7f9fd9249a80 <invoke_main_func>, data=data@entry=0x7ffe4b9a1510) at threads.c:710
> #27 0x00007f9fd9249c32 in scm_boot_guile (argc=argc@entry=2, argv=argv@entry=0x7ffe4b9a1668, main_func=main_func@entry=0x400cb0 <inner_main>, closure=closure@entry=0x0) at init.c:323
> #28 0x0000000000400b70 in main (argc=2, argv=0x7ffe4b9a1668) at guile.c:101
> (gdb) info threads
>   Id   Target Id         Frame 
> * 1    Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>
> So it seems quite clear that the thing has the alloc lock taken.  I
> suppose this can happen if one of the libgc threads runs right when we
> call fork and takes the alloc lock, right?

Does libgc spawn threads that run concurrently with user threads?  If
so, that would be news to me.  My understanding was that incremental
marking occurs within GC allocation calls, and marking threads are only
spawned after all user threads have been stopped, but I could be wrong.

The first idea that comes to my mind is that perhaps the finalization
thread is holding the GC allocation lock when 'fork' is called.  The
finalization thread grabs the GC allocation lock every time it calls
'GC_invoke_finalizers'.  All ports backed by POSIX file descriptors
(including pipes) register finalizers and therefore spawn the
finalization thread and make work for it to do.

Another possibility: both the finalization thread and the signal
delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
which also temporarily grabs the GC allocation lock before calling the
specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
libgc.  You spawn the signal delivery thread by calling 'sigaction' and
you make work for it to do every second when the SIGALRM is delivered.

Toggle quote (5 lines)

> If that is correct, the fix would be to call fork within

> ‘GC_call_with_alloc_lock’.

> How does that sound?

Sure, sounds good to me.

Toggle quote (3 lines)

> As a workaround on the Guix side, we might achieve the same effect by

> calling ‘gc-disable’ right before ‘primitive-fork’.

I don't think this would help.

Thanks,

Mark

Andy Wingo wrote on 5 Jul 2018 01:00

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87lgaqjemj.fsf@igalia.com

Hi!

On Thu 05 Jul 2018 05:33, Mark H Weaver <mhw@netris.org> writes:

Toggle quote (13 lines)>> One problem I’ve noticed is that the child process that
>> ‘call-with-decompressed-port’ spawns would be stuck trying to get the
>> allocation lock:
>>
>> So it seems quite clear that the thing has the alloc lock taken.  I
>> suppose this can happen if one of the libgc threads runs right when we
>> call fork and takes the alloc lock, right?
>
> Does libgc spawn threads that run concurrently with user threads?  If
> so, that would be news to me.  My understanding was that incremental
> marking occurs within GC allocation calls, and marking threads are only
> spawned after all user threads have been stopped, but I could be wrong.

I think Mark is correct.

Toggle quote (3 lines)

> The first idea that comes to my mind is that perhaps the finalization

> thread is holding the GC allocation lock when 'fork' is called.

So of course we agree you're only supposed to "fork" when there are no
other threads running, I think.

As far as the finalizer thread goes, "primitive-fork" calls
"scm_i_finalizer_pre_fork" which should join the finalizer thread,
before the fork.  There could be a bug obviously but the intention is
for Guile to shut down its internal threads.  Here's the body of
primitive-fork fwiw:

    {
      int pid;
      scm_i_finalizer_pre_fork ();
      if (scm_ilength (scm_all_threads ()) != 1)
        /* Other threads may be holding on to resources that Guile needs --
           it is not safe to permit one thread to fork while others are
           running.
    
           In addition, POSIX clearly specifies that if a multi-threaded
           program forks, the child must only call functions that are
           async-signal-safe.  We can't guarantee that in general.  The best
           we can do is to allow forking only very early, before any call to
           sigaction spawns the signal-handling thread.  */
        scm_display
          (scm_from_latin1_string
           ("warning: call to primitive-fork while multiple threads are running;\n"
            "         further behavior unspecified.  See \"Processes\" in the\n"
            "         manual, for more information.\n"),
           scm_current_warning_port ());
      pid = fork ();
      if (pid == -1)
        SCM_SYSERROR;
      return scm_from_int (pid);
    }

Toggle quote (7 lines)

> Another possibility: both the finalization thread and the signal

> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',

> which also temporarily grabs the GC allocation lock before calling the

> specified function. See 'GC_do_blocking_inner' in pthread_support.c in

> libgc. You spawn the signal delivery thread by calling 'sigaction' and

> you make work for it to do every second when the SIGALRM is delivered.

The signal thread is a possibility though in that case you'd get a

warning; the signal-handling thread appears in scm_all_threads. Do you

see a warning? If you do, that is a problem :)

Toggle quote (7 lines)

>> If that is correct, the fix would be to call fork within

>> ‘GC_call_with_alloc_lock’.

>> How does that sound?

> Sure, sounds good to me.

I don't think this is necessary.  I think the problem is that other
threads are running.  If we solve that, then we solve this issue; if we
don't solve that, we don't know what else those threads are doing, so we
don't know what mutexes and other state they might have.

Andy

Ludovic Courtès wrote on 5 Jul 2018 01:34

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87lgaqgjxd.fsf@gnu.org

Hello Mark,

Thanks for chiming in!

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (5 lines)

> Does libgc spawn threads that run concurrently with user threads? If

> so, that would be news to me. My understanding was that incremental

> marking occurs within GC allocation calls, and marking threads are only

> spawned after all user threads have been stopped, but I could be wrong.

libgc launches mark threads as soon as it is initialized, I think.

Toggle quote (7 lines)

> The first idea that comes to my mind is that perhaps the finalization

> thread is holding the GC allocation lock when 'fork' is called. The

> finalization thread grabs the GC allocation lock every time it calls

> 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors

> (including pipes) register finalizers and therefore spawn the

> finalization thread and make work for it to do.

In 2.2 there’s scm_i_finalizer_pre_fork that takes care of shutting down

the finalization thread right before fork. So the finalization thread

cannot be blamed, AIUI.

Toggle quote (7 lines)

> Another possibility: both the finalization thread and the signal

> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',

> which also temporarily grabs the GC allocation lock before calling the

> specified function. See 'GC_do_blocking_inner' in pthread_support.c in

> libgc. You spawn the signal delivery thread by calling 'sigaction' and

> you make work for it to do every second when the SIGALRM is delivered.

That’s definitely a possibility: the signal thread could be allocating

stuff, and thereby taking the alloc lock just at that time.

Toggle quote (7 lines)

>> If that is correct, the fix would be to call fork within

>> ‘GC_call_with_alloc_lock’.

>> How does that sound?

> Sure, sounds good to me.

Here’s a patch:

Toggle diff (33 lines)diff --git a/libguile/posix.c b/libguile/posix.c
index b0fcad5fd..088e75631 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1,
 #undef FUNC_NAME
 
 #ifdef HAVE_FORK
+static void *
+do_fork (void *pidp)
+{
+  * (int *) pidp = fork ();
+  return NULL;
+}
+
 SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
             (),
 	    "Creates a new \"child\" process by duplicating the current \"parent\" process.\n"
@@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
         "         further behavior unspecified.  See \"Processes\" in the\n"
         "         manual, for more information.\n"),
        scm_current_warning_port ());
-  pid = fork ();
+
+  /* Take the alloc lock to make sure it is released when the child
+     process starts.  Failing to do that the child process could start
+     in a state where the alloc lock is taken and will never be
+     released.  */
+  GC_call_with_alloc_lock (do_fork, &pid);
+
   if (pid == -1)
     SCM_SYSERROR;
   return scm_from_int (pid);

Thoughts?

Unfortunately my ‘call-with-decompressed-port’ reproducer doesn’t seem t

to reproduce much today so I can’t tell if this helps (I let it run more

than 5 minutes with the supposedly-buggy Guile and nothing happened…).

Thanks,

Ludo’.

Mark H Weaver wrote on 5 Jul 2018 03:05

Recipients:(name . Andy Wingo)(address . wingo@igalia.com)

Message-ID:87601ukneg.fsf@netris.org

Hi,

Andy Wingo <wingo@igalia.com> writes:

Toggle quote (17 lines)> On Thu 05 Jul 2018 05:33, Mark H Weaver <mhw@netris.org> writes:
>
>>> One problem I’ve noticed is that the child process that
>>> ‘call-with-decompressed-port’ spawns would be stuck trying to get the
>>> allocation lock:
>>>
>>> So it seems quite clear that the thing has the alloc lock taken.  I
>>> suppose this can happen if one of the libgc threads runs right when we
>>> call fork and takes the alloc lock, right?
>>
>> Does libgc spawn threads that run concurrently with user threads?  If
>> so, that would be news to me.  My understanding was that incremental
>> marking occurs within GC allocation calls, and marking threads are only
>> spawned after all user threads have been stopped, but I could be wrong.
>
> I think Mark is correct.

Actually, looking at the libgc code more closely, it seems that Ludovic

is correct. GC_init calls GC_start_mark_threads_inner if PARALLEL_MARK

is defined at compile time.

However, it's also the case that libgc uses 'pthread_atfork' (where

available) to arrange to grab the GC allocation as well as the "mark

locks" in the case where parallel marking is enabled. See

fork_prepare_proc, fork_parent_proc, and fork_child_proc in

pthread_support.c.

As the libgc developers admit in the comment above 'fork_prepare_proc',

they are not strictly meeting the requirements for safe use of 'fork',

but they _are_ grabbing the allocation lock during 'fork', at least on

systems that support 'pthread_atfork'.

It looks like setting the GC_MARKERS environment variable to 1 should

result in 'available_markers_m1' being set to 0, and thus effectively

disable parallel marking. In that case, if I understand the code

correctly, no marker threads will be spawned.

It would be interesting to see if this problem can be reproduced when

GC_MARKERS is set to 1.

Toggle quote (11 lines)>> The first idea that comes to my mind is that perhaps the finalization
>> thread is holding the GC allocation lock when 'fork' is called.
>
> So of course we agree you're only supposed to "fork" when there are no
> other threads running, I think.
>
> As far as the finalizer thread goes, "primitive-fork" calls
> "scm_i_finalizer_pre_fork" which should join the finalizer thread,
> before the fork.  There could be a bug obviously but the intention is
> for Guile to shut down its internal threads.

Ah, good! I didn't know this. So, I guess the problem is most likely

elsewhere.

We should probably arrange to join the signal delivery thread at the

same time, and then to respawn it in the parent and child if needed.

What do you think?

Toggle quote (7 lines)

> Here's the body of primitive-fork fwiw:

> {

> int pid;

> scm_i_finalizer_pre_fork ();

> if (scm_ilength (scm_all_threads ()) != 1)

I think there's a race here. I think it's possible for the finalizer

thread to be respawned after 'scm_i_finalizer_pre_fork' in two different

ways:

(1) 'scm_all_threads' performs allocation, which could trigger GC.

(2) another thread could perform heap allocation and trigger GC after

'scm_i_finalizer_pre_fork' joins the thread. it might then shut

down before 'scm_all_thread' is called.

However, these are highly unlikely scenarios, and most likely not the

problem we're seeing here.

Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the

'if', to avoid this race.

Toggle quote (11 lines)>> Another possibility: both the finalization thread and the signal
>> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
>> which also temporarily grabs the GC allocation lock before calling the
>> specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
>> libgc.  You spawn the signal delivery thread by calling 'sigaction' and
>> you make work for it to do every second when the SIGALRM is delivered.
>
> The signal thread is a possibility though in that case you'd get a
> warning; the signal-handling thread appears in scm_all_threads.  Do you
> see a warning?  If you do, that is a problem :)

Good point!

Toggle quote (12 lines)>>> If that is correct, the fix would be to call fork within
>>> ‘GC_call_with_alloc_lock’.
>>>
>>> How does that sound?
>>
>> Sure, sounds good to me.
>
> I don't think this is necessary.  I think the problem is that other
> threads are running.  If we solve that, then we solve this issue; if we
> don't solve that, we don't know what else those threads are doing, so we
> don't know what mutexes and other state they might have.

On second thought, I agree with Andy here.  Grabbing the allocation lock
shouldn't be needed, and moreover it's not sufficient.  I think we
really need to ensure that no other threads are running, in which case
grabbing the allocation lock is pointless.  Anyway, it seems that libgc
is already arranging to do this, at least on modern GNU/Linux systems.

     Thanks,
       Mark

Ludovic Courtès wrote on 5 Jul 2018 05:27

Recipients:(name . Andy Wingo)(address . wingo@igalia.com)

Message-ID:87po01eul3.fsf@gnu.org

Hello,

Andy Wingo <wingo@igalia.com> skribis:

Toggle quote (2 lines)

> On Thu 05 Jul 2018 05:33, Mark H Weaver <mhw@netris.org> writes:

[...]

Toggle quote (11 lines)>> Another possibility: both the finalization thread and the signal
>> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
>> which also temporarily grabs the GC allocation lock before calling the
>> specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
>> libgc.  You spawn the signal delivery thread by calling 'sigaction' and
>> you make work for it to do every second when the SIGALRM is delivered.
>
> The signal thread is a possibility though in that case you'd get a
> warning; the signal-handling thread appears in scm_all_threads.  Do you
> see a warning?  If you do, that is a problem :)

I don’t see a warning.

But as a Guile user, I shouldn’t see a warning just because there’s a

signal thread anyway; it’s not a thread I spawned myself.

The weird thing is that the signal thread always exists, and it’s

surprising IMO that it shows up in ‘scm_all_threads’ because it’s not a

“user thread”. The other surprise is that the warning isn’t triggered:

Toggle snippet (15 lines)

$ guile

GNU Guile 2.2.4

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.

This program is free software, and you are welcome to redistribute it

under certain conditions; type `,show c' for details.

Enter `,help' for help.

scheme@(guile-user)> (all-threads)

$1 = (#<thread 140488743438080 (1b06c40)> #<thread 140488815434624 (1b06e00)>)

scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0))

;; no warning

Ludo’, surprised. :-)

Andy Wingo wrote on 5 Jul 2018 07:04

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:874lhdkcdf.fsf@igalia.com

Hi,

On Thu 05 Jul 2018 12:05, Mark H Weaver <mhw@netris.org> writes:

Toggle quote (6 lines)

> However, it's also the case that libgc uses 'pthread_atfork' (where

> available) to arrange to grab the GC allocation as well as the "mark

> locks" in the case where parallel marking is enabled. See

> fork_prepare_proc, fork_parent_proc, and fork_child_proc in

> pthread_support.c.

I don't think this is enabled by default. You have to configure your

libgc this way. When investigating similar bugs, I proposed enabling it

by default a while ago:

http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-February/004958.html

I ended up realizing that pthread_atfork was just a bogus interface.

For one, it turns out that POSIX clearly says that if a multithreaded

program forks, the behavior of the child after the fork is undefined if

it calls any non-async-signal-safe function before calling exec():

https://lists.gnu.org/archive/html/guile-devel/2012-02/msg00157.html

But practically, the only reasonable thing to do with atfork is to grab

all of the locks, then release them after forking, in both child and

parent. However you can't do this without deadlocking from a library,

as the total lock order is a property of the program and not something a

library can decide.

There are thus two solutions: either ensure that there are no other

threads when you fork, or only call async-signal-safe functions before

you exec(). open-process does the latter. fork will warn if the former

is not the case. When last I looked into this, I concluded that

pthread_atfork doesn't buy us anything, though I could be wrong!

Toggle quote (23 lines)>> Here's the body of primitive-fork fwiw:
>>
>>     {
>>       int pid;
>>       scm_i_finalizer_pre_fork ();
>>       if (scm_ilength (scm_all_threads ()) != 1)
>
> I think there's a race here.  I think it's possible for the finalizer
> thread to be respawned after 'scm_i_finalizer_pre_fork' in two different
> ways:
>
> (1) 'scm_all_threads' performs allocation, which could trigger GC.
>
> (2) another thread could perform heap allocation and trigger GC after
>     'scm_i_finalizer_pre_fork' joins the thread.  it might then shut
>     down before 'scm_all_thread' is called.
>
> However, these are highly unlikely scenarios, and most likely not the
> problem we're seeing here.
>
> Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the
> 'if', to avoid this race.

Good point! Probably we should use some non-allocating

scm_i_is_multi_threaded() or something. We can't move the pre-fork

thing though because one of the effects we are looking for is to reduce

the thread count!

Cheers,

Andy

Andy Wingo wrote on 5 Jul 2018 07:08

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87zhz5ixle.fsf@igalia.com

On Thu 05 Jul 2018 14:27, ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (13 lines)> Hello,
>
> Andy Wingo <wingo@igalia.com> skribis:
>
>> The signal thread is a possibility though in that case you'd get a
>> warning; the signal-handling thread appears in scm_all_threads.  Do you
>> see a warning?  If you do, that is a problem :)
>
> I don’t see a warning.
>
> But as a Guile user, I shouldn’t see a warning just because there’s a
> signal thread anyway; it’s not a thread I spawned myself.

I understand but it's how it works. If we want to change this, probably

we need a similar interface as we have with finalization.

Toggle quote (18 lines)> The weird thing is that the signal thread always exists, and it’s
> surprising IMO that it shows up in ‘scm_all_threads’ because it’s not a
> “user thread”.  The other surprise is that the warning isn’t triggered:
>
> $ guile
> GNU Guile 2.2.4
> Copyright (C) 1995-2017 Free Software Foundation, Inc.
>
> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
> This program is free software, and you are welcome to redistribute it
> under certain conditions; type `,show c' for details.
>
> Enter `,help' for help.
> scheme@(guile-user)> (all-threads)
> $1 = (#<thread 140488743438080 (1b06c40)> #<thread 140488815434624 (1b06e00)>)
> scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0))
> ;; no warning

Are you certain that this is the signal-handling thread and not the

finalizer thread? I suspect it is the finalizer thread, and that it

gets properly shut down before the fork.

Regarding seeing the warning: do you do make some other binding for the

default warning port in Guix?

Andy

Ludovic Courtès wrote on 6 Jul 2018 08:35

Recipients:(name . Andy Wingo)(address . wingo@igalia.com)

Message-ID:87fu0wbcn9.fsf@gnu.org

Hello Andy,

Andy Wingo <wingo@igalia.com> skribis:

Toggle quote (18 lines)> On Thu 05 Jul 2018 14:27, ludo@gnu.org (Ludovic Courtès) writes:
>
>> Hello,
>>
>> Andy Wingo <wingo@igalia.com> skribis:
>>
>>> The signal thread is a possibility though in that case you'd get a
>>> warning; the signal-handling thread appears in scm_all_threads.  Do you
>>> see a warning?  If you do, that is a problem :)
>>
>> I don’t see a warning.
>>
>> But as a Guile user, I shouldn’t see a warning just because there’s a
>> signal thread anyway; it’s not a thread I spawned myself.
>
> I understand but it's how it works.  If we want to change this, probably
> we need a similar interface as we have with finalization.

Right, understood.

Toggle quote (9 lines)

>> scheme@(guile-user)> (all-threads)

>> $1 = (#<thread 140488743438080 (1b06c40)> #<thread 140488815434624 (1b06e00)>)

>> scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0))

>> ;; no warning

> Are you certain that this is the signal-handling thread and not the

> finalizer thread? I suspect it is the finalizer thread, and that it

> gets properly shut down before the fork.

Oh, you must be right.

Toggle quote (3 lines)

> Regarding seeing the warning: do you do make some other binding for the

> default warning port in Guix?

No.

Thanks,

Ludo’.

Ludovic Courtès wrote on 19 Oct 2019 13:04

control message for bug #31925

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87sgnov6wu.fsf@gnu.org

tags 31925 unreproducible
close 31925
quit

Your comment

This issue is archived.

To comment on this conversation send an email to 31925@patchwise.org

To respond to this issue using the mumi CLI, first switch to it

mumi current 31925

Then, you may apply the latest patchset in this issue (with sign off)

mumi am -- -s

Or, compose a reply to this issue

mumi compose

Or, send patches to this issue

mumi send-email *.patch

is:open	open issues
is:done	closed issues
submitter:<who>	search issue submitter
author:<who>	search by message author
date:yesterday..now	search by issue date
mdate:3m..2d	search by message date