GNU bug report logs

#63368 Build coordiantor "Signals delivery fails constantly" crashes

PackageSource(s)Maintainer(s)
guix PTS Buildd Popcon
Reply or subscribe to this bug. View this bug as an mbox, status mbox, or maintainer mbox

Report forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Mon, 08 May 2023 10:55:02 GMT) (full text, mbox, link).


Acknowledgement sent to Christopher Baines <mail@cbaines.net>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org. (Mon, 08 May 2023 10:55:02 GMT) (full text, mbox, link).


Message #5 received at submit@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: bug-guix@gnu.org
Subject: Build coordiantor "Signals delivery fails constantly" crashes
Date: Mon, 08 May 2023 11:45:21 +0100
[Message part 1 (text/plain, inline)]
Since the recent core-updates merge, I've seen the build coordinator
using less memory, but it's also been crashing in a new way, up to 10
times a day.

In the log, you see something like:

  2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
  2023-05-07 09:15:42 Signals delivery fails constantly

I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
do with this.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Wed, 10 May 2023 12:50:03 GMT) (full text, mbox, link).


Message #8 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Wed, 10 May 2023 13:47:11 +0100
[Message part 1 (text/plain, inline)]
Christopher Baines <mail@cbaines.net> writes:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

I think I've found a workaround. I found a list of environment variables
[1] you can set to affect the GC behaviour, and the first one I tried
(GC_RETRY_SIGNALS=0) seems to have had the desired affect, in that the
crashes/restarts have stopped.

1: https://github.com/ivmai/bdwgc/blob/master/docs/README.environment

I've sent a patch [2] to apply this setting as part of the service.

2: https://issues.guix.gnu.org/63417
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Thu, 25 May 2023 15:26:01 GMT) (full text, mbox, link).


Message #11 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Thu, 25 May 2023 17:24:56 +0200
Hi,

Christopher Baines <mail@cbaines.net> skribis:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

Normally on GNU/Linux libgc has:

  #define SIG_SUSPEND SIGPWR

The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
which should normally be fine.

Is there anything else that might interfere with libgc?

Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Thu, 25 May 2023 15:42:01 GMT) (full text, mbox, link).


Message #14 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: Ludovic Courtès <ludo@gnu.org>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Thu, 25 May 2023 16:26:34 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> Since the recent core-updates merge, I've seen the build coordinator
>> using less memory, but it's also been crashing in a new way, up to 10
>> times a day.
>>
>> In the log, you see something like:
>>
>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>
>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>> do with this.
>
> Normally on GNU/Linux libgc has:
>
>   #define SIG_SUSPEND SIGPWR
>
> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
> which should normally be fine.
>
> Is there anything else that might interfere with libgc?

I've seen this issue in both the build coordinator and nar-herder, both
of which use guile-sqlite, so I wonder if that could have something to
do with it.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Fri, 02 Jun 2023 17:13:01 GMT) (full text, mbox, link).


Message #17 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: 63368@debbugs.gnu.org
Cc: Ludovic Courtès <ludo@gnu.org>
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Fri, 02 Jun 2023 18:07:16 +0100
[Message part 1 (text/plain, inline)]
Christopher Baines <mail@cbaines.net> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:
>>
>>> Since the recent core-updates merge, I've seen the build coordinator
>>> using less memory, but it's also been crashing in a new way, up to 10
>>> times a day.
>>>
>>> In the log, you see something like:
>>>
>>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>>
>>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>>> do with this.
>>
>> Normally on GNU/Linux libgc has:
>>
>>   #define SIG_SUSPEND SIGPWR
>>
>> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
>> which should normally be fine.
>>
>> Is there anything else that might interfere with libgc?
>
> I've seen this issue in both the build coordinator and nar-herder, both
> of which use guile-sqlite, so I wonder if that could have something to
> do with it.

I've seen this happen with the build coordinator agent now (on
milano-guix-1):

  2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
  2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
  2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
  2023-06-02 19:01:22 Signals delivery fails constantly
  2023-06-02 19:01:29 locale is en_US.utf8
  2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)

Which is a bit more concerning, since the build coordinator agent is
intentionally quite simple (no SQLite for example).
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Tue, 06 Jun 2023 15:10:02 GMT) (full text, mbox, link).


Message #20 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Tue, 06 Jun 2023 17:09:03 +0200
Christopher Baines <mail@cbaines.net> skribis:

> I've seen this happen with the build coordinator agent now (on
> milano-guix-1):
>
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>   2023-06-02 19:01:22 Signals delivery fails constantly
>   2023-06-02 19:01:29 locale is en_US.utf8
>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>
> Which is a bit more concerning, since the build coordinator agent is
> intentionally quite simple (no SQLite for example).

The closure of (guix-build-coordinator agent) seems to be quite large
still.

Could you check what .so files are loaded by that code, perhaps via
/proc/PID/maps?

Thanks,
Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Tue, 06 Jun 2023 15:21:02 GMT) (full text, mbox, link).


Message #23 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: Ludovic Courtès <ludo@gnu.org>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Tue, 06 Jun 2023 16:19:39 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> I've seen this happen with the build coordinator agent now (on
>> milano-guix-1):
>>
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of
>> build inputs
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building:
>> /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>   2023-06-02 19:01:29 locale is en_US.utf8
>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>
>> Which is a bit more concerning, since the build coordinator agent is
>> intentionally quite simple (no SQLite for example).
>
> The closure of (guix-build-coordinator agent) seems to be quite large
> still.
>
> Could you check what .so files are loaded by that code, perhaps via
> /proc/PID/maps?

I think I see these (that's on milano-guix-1 currently):

/gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
/gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
/gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
/gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
/gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
/gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
/gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
/gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
/gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
/gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
/gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
/gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
/gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Fri, 09 Jun 2023 13:15:01 GMT) (full text, mbox, link).


Message #26 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Fri, 09 Jun 2023 15:14:22 +0200
Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:

[...]

>>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>>   2023-06-02 19:01:29 locale is en_US.utf8
>>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>>
>>> Which is a bit more concerning, since the build coordinator agent is
>>> intentionally quite simple (no SQLite for example).
>>
>> The closure of (guix-build-coordinator agent) seems to be quite large
>> still.
>>
>> Could you check what .so files are loaded by that code, perhaps via
>> /proc/PID/maps?
>
> I think I see these (that's on milano-guix-1 currently):
>
> /gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
> /gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
> /gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
> /gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
> /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
> /gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
> /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
> /gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
> /gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
> /gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
> /gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
> /gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
> /gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
> /gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
> /gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1


Hmm no idea.  I’ve never seen “Signals delivery fails” before so I
really wonder what could be causing this.  Would be great if you could
come up with a reduced test case, but I guess that won’t be easy.

Or perhaps you could run a Coordinator agent under ‘strace -f’ to see if
we get hints?

Ludo’.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@gnu.org> to control@debbugs.gnu.org. (Sun, 01 Dec 2024 14:24:01 GMT) (full text, mbox, link).


Information forwarded to bug-guix@gnu.org:
bug#63368; Package guix. (Sun, 01 Dec 2024 14:27:02 GMT) (full text, mbox, link).


Message #31 received at 63368@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: 63368@debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
Date: Sun, 01 Dec 2024 15:26:45 +0100
Christopher Baines <mail@cbaines.net> skribis:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly

Same with ‘guix publish’: https://issues.guix.gnu.org/74632

> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

I’m not sure when these started to happen for ‘guix publish’.

Data point: the ‘guix publish’ instance at guix.bordeaux.inria.fr never
encountered this problem.  The main difference compared to ci.guix is
that it does not produce lzip archives.  (I see the Coordinator uses
Guile-Lzlib; maybe that’s a lead.)

Ludo’.




Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Sun Dec 22 05:51:08 2024; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.