Report forwarded
to bug-guix@gnu.org: bug#63368; Package guix.
(Mon, 08 May 2023 10:55:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Christopher Baines <mail@cbaines.net>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org.
(Mon, 08 May 2023 10:55:02 GMT) (full text, mbox, link).
Since the recent core-updates merge, I've seen the build coordinator
using less memory, but it's also been crashing in a new way, up to 10
times a day.
In the log, you see something like:
2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
2023-05-07 09:15:42 Signals delivery fails constantly
I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
do with this.
Christopher Baines <mail@cbaines.net> writes:
> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
> 2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
> 2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.
I think I've found a workaround. I found a list of environment variables
[1] you can set to affect the GC behaviour, and the first one I tried
(GC_RETRY_SIGNALS=0) seems to have had the desired affect, in that the
crashes/restarts have stopped.
1: https://github.com/ivmai/bdwgc/blob/master/docs/README.environment
I've sent a patch [2] to apply this setting as part of the service.
2: https://issues.guix.gnu.org/63417
Hi,
Christopher Baines <mail@cbaines.net> skribis:
> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
> 2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
> 2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.
Normally on GNU/Linux libgc has:
#define SIG_SUSPEND SIGPWR
The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
which should normally be fine.
Is there anything else that might interfere with libgc?
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#63368; Package guix.
(Thu, 25 May 2023 15:42:01 GMT) (full text, mbox, link).
Ludovic Courtès <ludo@gnu.org> writes:
> Christopher Baines <mail@cbaines.net> skribis:
>
>> Since the recent core-updates merge, I've seen the build coordinator
>> using less memory, but it's also been crashing in a new way, up to 10
>> times a day.
>>
>> In the log, you see something like:
>>
>> 2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>> 2023-05-07 09:15:42 Signals delivery fails constantly
>>
>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>> do with this.
>
> Normally on GNU/Linux libgc has:
>
> #define SIG_SUSPEND SIGPWR
>
> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
> which should normally be fine.
>
> Is there anything else that might interfere with libgc?
I've seen this issue in both the build coordinator and nar-herder, both
of which use guile-sqlite, so I wonder if that could have something to
do with it.
Christopher Baines <mail@cbaines.net> writes:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:
>>
>>> Since the recent core-updates merge, I've seen the build coordinator
>>> using less memory, but it's also been crashing in a new way, up to 10
>>> times a day.
>>>
>>> In the log, you see something like:
>>>
>>> 2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>> 2023-05-07 09:15:42 Signals delivery fails constantly
>>>
>>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>>> do with this.
>>
>> Normally on GNU/Linux libgc has:
>>
>> #define SIG_SUSPEND SIGPWR
>>
>> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
>> which should normally be fine.
>>
>> Is there anything else that might interfere with libgc?
>
> I've seen this issue in both the build coordinator and nar-herder, both
> of which use guile-sqlite, so I wonder if that could have something to
> do with it.
I've seen this happen with the build coordinator agent now (on
milano-guix-1):
2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
2023-06-02 19:01:22 Signals delivery fails constantly
2023-06-02 19:01:29 locale is en_US.utf8
2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
Which is a bit more concerning, since the build coordinator agent is
intentionally quite simple (no SQLite for example).
Christopher Baines <mail@cbaines.net> skribis:
> I've seen this happen with the build coordinator agent now (on
> milano-guix-1):
>
> 2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
> 2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
> 2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
> 2023-06-02 19:01:22 Signals delivery fails constantly
> 2023-06-02 19:01:29 locale is en_US.utf8
> 2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>
> Which is a bit more concerning, since the build coordinator agent is
> intentionally quite simple (no SQLite for example).
The closure of (guix-build-coordinator agent) seems to be quite large
still.
Could you check what .so files are loaded by that code, perhaps via
/proc/PID/maps?
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#63368; Package guix.
(Tue, 06 Jun 2023 15:21:02 GMT) (full text, mbox, link).
Ludovic Courtès <ludo@gnu.org> writes:
> Christopher Baines <mail@cbaines.net> skribis:
>
>> I've seen this happen with the build coordinator agent now (on
>> milano-guix-1):
>>
>> 2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of
>> build inputs
>> 2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building:
>> /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>> 2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>> 2023-06-02 19:01:22 Signals delivery fails constantly
>> 2023-06-02 19:01:29 locale is en_US.utf8
>> 2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>
>> Which is a bit more concerning, since the build coordinator agent is
>> intentionally quite simple (no SQLite for example).
>
> The closure of (guix-build-coordinator agent) seems to be quite large
> still.
>
> Could you check what .so files are loaded by that code, perhaps via
> /proc/PID/maps?
I think I see these (that's on milano-guix-1 currently):
/gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
/gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
/gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
/gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
/gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
/gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
/gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
/gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
/gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
/gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
/gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
/gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
/gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1
Christopher Baines <mail@cbaines.net> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:
[...]
>>> 2023-06-02 19:01:22 Signals delivery fails constantly
>>> 2023-06-02 19:01:29 locale is en_US.utf8
>>> 2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>>
>>> Which is a bit more concerning, since the build coordinator agent is
>>> intentionally quite simple (no SQLite for example).
>>
>> The closure of (guix-build-coordinator agent) seems to be quite large
>> still.
>>
>> Could you check what .so files are loaded by that code, perhaps via
>> /proc/PID/maps?
>
> I think I see these (that's on milano-guix-1 currently):
>
> /gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
> /gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
> /gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
> /gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
> /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
> /gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
> /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
> /gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
> /gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
> /gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
> /gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
> /gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
> /gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
> /gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
> /gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1
Hmm no idea. I’ve never seen “Signals delivery fails” before so I
really wonder what could be causing this. Would be great if you could
come up with a reduced test case, but I guess that won’t be easy.
Or perhaps you could run a Coordinator agent under ‘strace -f’ to see if
we get hints?
Ludo’.
Severity set to 'important' from 'normal'
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Sun, 01 Dec 2024 14:24:01 GMT) (full text, mbox, link).
Information forwarded
to bug-guix@gnu.org: bug#63368; Package guix.
(Sun, 01 Dec 2024 14:27:02 GMT) (full text, mbox, link).
Christopher Baines <mail@cbaines.net> skribis:
> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
> 2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
> 2023-05-07 09:15:42 Signals delivery fails constantly
Same with ‘guix publish’: https://issues.guix.gnu.org/74632
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.
I’m not sure when these started to happen for ‘guix publish’.
Data point: the ‘guix publish’ instance at guix.bordeaux.inria.fr never
encountered this problem. The main difference compared to ci.guix is
that it does not produce lzip archives. (I see the Coordinator uses
Guile-Lzlib; maybe that’s a lead.)
Ludo’.
Debbugs is free software and licensed under the terms of the
GNU Public License version 2. The current version can be
obtained from https://bugs.debian.org/debbugs-source/.