GNU bug report logs

#77407 shepherd hangs at "herd restart nginx"

PackageSource(s)Maintainer(s)
guix PTS Buildd Popcon
Reply or subscribe to this bug. View this bug as an mbox, status mbox, or maintainer mbox

Report forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Mon, 31 Mar 2025 16:04:04 GMT) (full text, mbox, link).


Acknowledgement sent to Andreas Enge <andreas@enge.fr>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org. (Mon, 31 Mar 2025 16:04:04 GMT) (full text, mbox, link).


Message #5 received at submit@debbugs.gnu.org (full text, mbox, reply):

From: Andreas Enge <andreas@enge.fr>
To: bug-guix@gnu.org
Subject: shepherd hangs at "herd restart nginx"
Date: Mon, 31 Mar 2025 18:02:17 +0200
Open bug to get number.

Andreas





Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Mon, 31 Mar 2025 16:11:02 GMT) (full text, mbox, link).


Message #8 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Andreas Enge <andreas@enge.fr>
To: 77407@debbugs.gnu.org
Cc: ludo@gnu.org, mail@cbaines.net
Subject: Re: shepherd hangs at "herd restart nginx"
Date: Mon, 31 Mar 2025 18:09:51 +0200
Today I noticed that the QA webpage was 502 on bayfront.
I restarted qa-frontpage and bffe without problem.
When trying to do "herd restart nginx", the command did not show any
output and did not come back. Moreover, "herd status" also hang without
any output.

I could still ssh into the machine and then, thanks to Ludovic, reboot
with the magic command
  guix shell guile shepherd -- guile -c '(use-modules (shepherd system)) (sync) (reboot-kexec)'

Andreas





Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@gnu.org> to control@debbugs.gnu.org. (Mon, 31 Mar 2025 16:17:06 GMT) (full text, mbox, link).


Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Tue, 01 Apr 2025 12:21:05 GMT) (full text, mbox, link).


Message #13 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Andreas Enge <andreas@enge.fr>
Cc: 77407@debbugs.gnu.org, mail@cbaines.net
Subject: Re: shepherd hangs at "herd restart nginx"
Date: Tue, 01 Apr 2025 14:19:32 +0200
Hello,

Andreas Enge <andreas@enge.fr> skribis:

> Today I noticed that the QA webpage was 502 on bayfront.
> I restarted qa-frontpage and bffe without problem.
> When trying to do "herd restart nginx", the command did not show any
> output and did not come back. Moreover, "herd status" also hang without
> any output.

FWIW, I tested ‘bare-bones.tmpl’ with just nginx added; running:

  while herd restart nginx; do : ; done

in that VM didn’t trigger the hang.

So I’m a bit at loss as to why this fails on bayfront.  If someone has
ideas (and time!) on how to get a config closer to that of bayfront that
we could test in a VM, that’d be great.  Otherwise, we’ll have to debug
in production…

> I could still ssh into the machine and then, thanks to Ludovic, reboot
> with the magic command
>   guix shell guile shepherd -- guile -c '(use-modules (shepherd system)) (sync) (reboot-kexec)'

To be clear: this should be a last resort.  :-)

Thanks,
Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Wed, 09 Apr 2025 16:58:01 GMT) (full text, mbox, link).


Message #16 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Andreas Enge <andreas@enge.fr>
Cc: 77407@debbugs.gnu.org, mail@cbaines.net
Subject: Re: bug#77407: shepherd hangs at "herd restart nginx"
Date: Wed, 09 Apr 2025 18:57:08 +0200
Hi!

I just realized: this may well be fixed by Shepherd commit
a5d9a41aa0bfadea9617df4411b86c6ea0c525bd (see
<https://issues.guix.gnu.org/77373>.)  Would be great to give it a try
somehow.

Anyway, some more data.

Yesterday I ran ‘herd stop nginx’ on bayfront (which was running 1.0.2),
again causing ‘herd status’ to hang afterwards.  This time I captured an
strace log of PID 1.

The /var/log/messages showing what happens after ‘herd stop nginx’

--8<---------------cut here---------------start------------->8---
2025-04-08 11:28:27 localhost shepherd[1]: Stopping service certbot-certificate-renewal...
2025-04-08 11:28:27 localhost shepherd[1]: Terminating timer 'certbot-certificate-renewal' with 0 processes running.
2025-04-08 11:28:27 localhost shepherd[1]: Service certbot-certificate-renewal stopped.
2025-04-08 11:28:27 localhost shepherd[1]: Service certbot-certificate-renewal is now stopped.
2025-04-08 11:28:27 localhost shepherd[1]: Stopping service nginx...
2025-04-08 11:28:27 localhost shepherd[1]: Registering new logger for nginx.
2025-04-08 11:28:28 localhost shepherd[1]: [nginx] nginx: [alert] could not open error log file: open() "/gnu/store/w2pi23rrk1dy95rzpb9l87gcj3c4jb38-nginx-1.27.3/logs/error.log" failed (2: No such file or directory)
2025-04-08 11:28:29 localhost shepherd[1]: [nginx] 2025/04/08 11:28:28 [warn] 12958#0: duplicate MIME type "text/html" in /gnu/store/plpbgkr7wxl0qm4v0zf24vkvznvlc28y-nginx.conf:1043
2025-04-08 11:28:29 localhost shepherd[1]: [nginx] 2025/04/08 11:28:28 [warn] 12958#0: duplicate MIME type "text/html" in /gnu/store/plpbgkr7wxl0qm4v0zf24vkvznvlc28y-nginx.conf:1107
2025-04-08 11:28:29 localhost shepherd[1]: [nginx] 2025/04/08 11:28:28 [warn] 12958#0: duplicate MIME type "text/html" in /gnu/store/plpbgkr7wxl0qm4v0zf24vkvznvlc28y-nginx.conf:4100
2025-04-08 11:28:32 localhost shepherd[1]: [nginx] 2025/04/08 11:28:28 [warn] 12958#0: conflicting server name "bayfront.guix.gnu.org" on 0.0.0.0:80, ignored
2025-04-08 11:28:32 localhost shepherd[1]: Service nginx might have failed to stop.
2025-04-08 11:28:32 localhost shepherd[1]: [nginx] 2025/04/08 11:28:28 [warn] 12958#0: conflicting server name "logs.guix.gnu.org" on 0.0.0.0:80, ignored
2025-04-08 11:28:32 localhost shepherd[1]: Service nginx is now stopped.
2025-04-08 11:28:32 localhost shepherd[1]: Accepted connection on 0.0.0.0:22 from 101.35.215.239:37996.
2025-04-08 11:29:40 localhost shepherd[1]: 2 connections still in use after sshd-25913 termination.
2025-04-08 11:29:40 localhost shepherd[1]: Service sshd-25913 (PID 12257) exited with 3.
2025-04-08 11:29:40 localhost shepherd[1]: Service sshd-25913 has been disabled.
2025-04-08 11:32:37 localhost shepherd[1]: Accepted connection on [::]:22 from [2a0c:e300::58]:40428.
--8<---------------cut here---------------end--------------->8---

(The “might have failed to stop” bit is harmless and fixed in
3f4438741615aa1a1198f9415be8bd110138d83e.)

On berlin, which does not exhibit the bug, we get something similar (for
‘herd restart nginx’ here):

--8<---------------cut here---------------start------------->8---
2025-04-07 16:11:25 localhost shepherd[1]: Stopping service certbot-renewal...
2025-04-07 16:11:25 localhost shepherd[1]: Terminating timer 'certbot-renewal' with 0 processes running.
2025-04-07 16:11:25 localhost shepherd[1]: Service certbot-renewal stopped.
2025-04-07 16:11:25 localhost shepherd[1]: Service certbot-renewal is now stopped.
2025-04-07 16:11:26 localhost shepherd[1]: Stopping service nginx...
2025-04-07 16:11:26 localhost shepherd[1]: Registering new logger for nginx.
2025-04-07 16:11:26 localhost shepherd[1]: [nginx] nginx: [alert] could not open error log file: open() "/gnu/store/40mb76jc9qbm0d36qyncnpgrgsfab0db-nginx-1.27.3/logs/error.log" failed (2: No such file or directory)
2025-04-07 16:11:26 localhost shepherd[1]: [nginx] 2025/04/07 16:11:26 [warn] 27359#0: duplicate MIME type "text/html" in /gnu/store/s2yrd68wr9cd2iirc900491qgigyfg0z-nginx.conf:3593
2025-04-07 16:11:26 localhost shepherd[1]: [nginx] 2025/04/07 16:11:26 [notice] 27359#0: signal process started
2025-04-07 16:11:26 localhost shepherd[1]: Service nginx stopped.
2025-04-07 16:11:26 localhost shepherd[1]: Service nginx is now stopped.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/bootstrappable.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/issues-guix-gnu-org.https.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/qualif.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/disarchive.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/bootstrappable.https.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/dump-guix-gnu-org.https.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/workflows-guix-info.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/workflows-guix-info.https.access.log.
2025-04-07 16:11:26 localhost shepherd[1]: Starting service user-homes...
2025-04-07 16:11:26 localhost shepherd[1]: Service user-homes has been started.
2025-04-07 16:11:26 localhost shepherd[1]: Service user-homes started.
2025-04-07 16:11:26 localhost shepherd[1]: Service user-homes running with value #t.
2025-04-07 16:11:26 localhost shepherd[1]: Respawning anonip-/var/log/anonip/https.access.log.
--8<---------------cut here---------------end--------------->8---

Unfortunately the strace log isn’t very helpful:

--8<---------------cut here---------------start------------->8---
11:28:27 read(54, "(shepherd-command (version 0) (action stop) (service nginx) (arguments ()) (directory \"/home/ludo/src/maintenance/hydra\"))", 1024) = 122 <0.000162>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=326773810}) = 0 <0.000043>
11:28:27 epoll_wait(11, [], 16, 0)      = 0 <0.000040>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=327037318}) = 0 <0.000040>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=327207449}) = 0 <0.000040>
11:28:27 epoll_wait(11, [], 16, 0)      = 0 <0.000037>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=327433802}) = 0 <0.000038>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=327698776}) = 0 <0.000038>
11:28:27 epoll_wait(11, [], 16, 0)      = 0 <0.000036>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=327923662}) = 0 <0.000038>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=328114326}) = 0 <0.000038>
11:28:27 epoll_wait(11, [], 16, 0)      = 0 <0.000038>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=328338724}) = 0 <0.000038>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=355846206}) = 0 <0.000069>
11:28:27 epoll_wait(11, [{events=EPOLLIN, data={u32=38, u64=38}}], 16, 0) = 1 <0.000045>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=356163002}) = 0 <0.000041>
11:28:27 read(38, "G", 1)               = 1 <0.000049>
11:28:27 read(38, "E", 1)               = 1 <0.000038>
11:28:27 read(38, "T", 1)               = 1 <0.000038>
[…]
11:28:27 write(7, "Stopping service certbot-certificate-renewal...\n", 48) = 48 <0.000052>
[…]
11:28:27 write(25, "2025-04-08 11:28:27 localhost shepherd[1]: Service certbot-certificate-renewal stopped.\n", 88) = 88 <0.000049>
[…]
11:28:27 write(24, "shepherd[1]: Stopping service nginx...\n", 39) = 39 <0.000050>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=560464210}) = 0 <0.000045>
11:28:27 epoll_wait(11, [{events=EPOLLIN, data={u32=17, u64=17}}], 16, 0) = 1 <0.000040>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=560715007}) = 0 <0.000042>
11:28:27 getuid()                       = 0 <0.000052>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=687926724}) = 0 <0.000040>
11:28:27 epoll_wait(11, [], 16, 0)      = 0 <0.000038>
11:28:27 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104507, tv_nsec=688178499}) = 0 <0.000037>
11:28:27 rt_sigprocmask(SIG_BLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0 <0.000055>
11:28:27 pipe2([56, 59], O_CLOEXEC)     = 0 <0.000060>
11:28:27 fcntl(56, F_GETFL)             = 0 (flags O_RDONLY) <0.000038>
11:28:27 lseek(56, 0, SEEK_CUR)         = -1 ESPIPE (Illegal seek) <0.000037>
11:28:27 fcntl(59, F_GETFL)             = 0x1 (flags O_WRONLY) <0.000036>
11:28:27 lseek(59, 0, SEEK_CUR)         = -1 ESPIPE (Illegal seek) <0.000036>
11:28:27 rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0 <0.000035>
11:28:27 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f01e6e99850) = 12958 <0.008960>
11:28:27 fcntl(56, F_GETFL)             = 0 (flags O_RDONLY) <0.000040>
11:28:27 fcntl(56, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 <0.000039>
11:28:27 close(59)                      = 0 <0.000039>
11:28:27 rt_sigprocmask(SIG_SETMASK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0 <0.000038>
11:28:27 recvfrom(17, "shepherd[1]: Stopping service nginx...\n", 1024, 0, 0x7ffc10c01010, [112 => 0]) = 39 <0.000060>
[…]
11:28:27 read(56, 0x7f01d9a634d0, 1)    = -1 EAGAIN (Resource temporarily unavailable) <0.000040>
11:28:27 epoll_ctl(11, EPOLL_CTL_MOD, 56, {events=EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, data={u32=56, u64=56}}) = -1 ENOENT (No such file or directory) <0.000041>
11:28:27 epoll_ctl(11, EPOLL_CTL_ADD, 56, {events=EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, data={u32=56, u64=56}}) = 0 <0.000043>
11:28:27 write(7, "Registering new logger for nginx.\n", 34) = 34 <0.000048>
[…]
11:28:28 read(56, "n", 1)               = 1 <0.000049>
11:28:28 read(56, "g", 1)               = 1 <0.000046>
11:28:28 read(56, "i", 1)               = 1 <0.000042>
11:28:28 read(56, "n", 1)               = 1 <0.000040>
11:28:28 read(56, "x", 1)               = 1 <0.000039>
11:28:28 read(56, ":", 1)               = 1 <0.000039>
11:28:28 read(56, " ", 1)               = 1 <0.000062>
11:28:28 read(56, "[", 1)               = 1 <0.000039>
11:28:28 read(56, "a", 1)               = 1 <0.000042>
11:28:28 read(56, "l", 1)               = 1 <0.000042>
11:28:28 read(56, "e", 1)               = 1 <0.000041>
11:28:28 read(56, "r", 1)               = 1 <0.000044>
11:28:28 read(56, "t", 1)               = 1 <0.000064>
11:28:28 read(56, "]", 1)               = 1 <0.000043>
[…]
11:28:29 write(7, "[", 1)               = 1 <0.000048>
11:28:29 write(7, "nginx", 5)           = 5 <0.000035>
11:28:29 write(7, "] ", 2)              = 2 <0.000034>
11:28:29 write(7, "2025/04/08 11:28:28 [warn] 12958#0: duplicate MIME type \"text/html\" in /gnu/store/plpbgkr7wxl0qm4v0zf24vkvznvlc28y-nginx.conf:1043", 130) = 130 <0.000036>
11:28:29 write(7, "\n", 1)              = 1 <0.000036>
[…]
11:28:32 accept4(57, {sa_family=AF_INET, sin_port=htons(37996), sin_addr=inet_addr("101.35.215.239")}, [112 => 16], SOCK_CLOEXEC) = 54 <0.000058>
11:28:32 write(7, "Accepted connection on 0.0.0.0:22 from 101.35.215.239:37996.\n", 61) = 61 <0.000050>
11:28:32 fcntl(54, F_GETFL)             = 0x2 (flags O_RDWR) <0.000037>
11:28:32 fcntl(54, F_SETFL, O_RDWR)     = 0 <0.000037>
11:28:32 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104512, tv_nsec=512507676}) = 0 <0.000037>
11:28:32 rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0 <0.000035>
11:28:32 epoll_wait(11, [{events=EPOLLIN, data={u32=6, u64=6}}], 16, 273) = 1 <0.000039>
11:28:32 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104512, tv_nsec=512933004}) = 0 <0.000037>
[…]
11:28:52 accept4(16, 0x7ffc10c01010, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) <0.000057>
11:28:52 epoll_ctl(11, EPOLL_CTL_MOD, 16, {events=EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, data={u32=16, u64=16}}) = 0 <0.000042>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=448197354}) = 0 <0.000037>
11:28:52 epoll_wait(11, [], 16, 0)      = 0 <0.000038>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=448434462}) = 0 <0.000036>
11:28:52 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000046>
11:28:52 write(41, "2025-04-08 11:28:52      39:4 17 (call-with-sigint _ _)\n", 56) = 56 <0.000105>
11:28:52 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000045>
11:28:52 write(41, "2025-04-08 11:28:52 In fibers.scm:\n", 35) = 35 <0.000058>
11:28:52 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000045>
11:28:52 write(19, "2025-04-08 11:28:52 localhost sudo:     ludo : TTY=pts/9 ; PWD=/home/ludo/src/maintenance/hydra ; USER=root ; COMMAND=/run/current-system/profile/bin/herd status nginx\n", 168) = 168 <0.000062>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=450211551}) = 0 <0.000039>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=450387060}) = 0 <0.000039>
11:28:52 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000045>
11:28:52 write(19, "2025-04-08 11:28:52 localhost sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=30012)\n", 111) = 111 <0.000051>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=450934120}) = 0 <0.000038>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=451064163}) = 0 <0.000059>
11:28:52 read(59, "(shepherd-command (version 0) (action status) (service nginx) (arguments ()) (directory \"/home/ludo/src/maintenance/hydra\"))", 1024) = 124 <0.000084>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=452217926}) = 0 <0.000040>
11:28:52 epoll_wait(11, [], 16, 0)      = 0 <0.000044>
11:28:52 clock_gettime(CLOCK_REALTIME, {tv_sec=1744104532, tv_nsec=452460901}) = 0 <0.000039>
--8<---------------cut here---------------end--------------->8---

At that point it was still accepting connections on
/var/run/shepherd/socket and reading them, reading process output and
logging it, but not much else.

For example, the “Accepted connection on …:22” messages do not have a
corresponding ‘clone’ call and not even “Starting service sshd-1234...”
as we’d normally see right after.

Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Thu, 10 Apr 2025 09:11:01 GMT) (full text, mbox, link).


Message #19 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: Ludovic Courtès <ludo@gnu.org>
Cc: Andreas Enge <andreas@enge.fr>, 77407@debbugs.gnu.org
Subject: Re: bug#77407: shepherd hangs at "herd restart nginx"
Date: Thu, 10 Apr 2025 10:10:21 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo@gnu.org> writes:

> I just realized: this may well be fixed by Shepherd commit
> a5d9a41aa0bfadea9617df4411b86c6ea0c525bd (see
> <https://issues.guix.gnu.org/77373>.)  Would be great to give it a try
> somehow.

I've reconfigured one of the machines I was struggling with this issue
on with shepherd commit 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 and I
can no longer reproduce the issue.

There was a failing shepherd test, but I just disabled tests.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Thu, 10 Apr 2025 12:39:02 GMT) (full text, mbox, link).


Message #22 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: Andreas Enge <andreas@enge.fr>, 77407@debbugs.gnu.org
Subject: Re: bug#77407: shepherd hangs at "herd restart nginx"
Date: Thu, 10 Apr 2025 14:38:27 +0200
Christopher Baines <mail@cbaines.net> skribis:

>> I just realized: this may well be fixed by Shepherd commit
>> a5d9a41aa0bfadea9617df4411b86c6ea0c525bd (see
>> <https://issues.guix.gnu.org/77373>.)  Would be great to give it a try
>> somehow.
>
> I've reconfigured one of the machines I was struggling with this issue
> on with shepherd commit 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 and I
> can no longer reproduce the issue.

Did you try hard enough? :-)  As in:

  while herd restart nginx; do : ; done

> There was a failing shepherd test, but I just disabled tests.

If you still have the build log, I’d be happy to see it.

Thanks for testing!

Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Thu, 10 Apr 2025 12:54:03 GMT) (full text, mbox, link).


Message #25 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Christopher Baines <mail@cbaines.net>
To: Ludovic Courtès <ludo@gnu.org>
Cc: Andreas Enge <andreas@enge.fr>, 77407@debbugs.gnu.org
Subject: Re: bug#77407: shepherd hangs at "herd restart nginx"
Date: Thu, 10 Apr 2025 13:52:58 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>>> I just realized: this may well be fixed by Shepherd commit
>>> a5d9a41aa0bfadea9617df4411b86c6ea0c525bd (see
>>> <https://issues.guix.gnu.org/77373>.)  Would be great to give it a try
>>> somehow.
>>
>> I've reconfigured one of the machines I was struggling with this issue
>> on with shepherd commit 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 and I
>> can no longer reproduce the issue.
>
> Did you try hard enough? :-)  As in:
>
>   while herd restart nginx; do : ; done

I did, and previously it happened almost every time I restarted nginx.

>> There was a failing shepherd test, but I just disabled tests.
>
> If you still have the build log, I’d be happy to see it.

Unfortunately I don't seem to have it, I can't reproduce the failure so
it seems to be a flaky test.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#77407; Package guix. (Thu, 10 Apr 2025 13:11:01 GMT) (full text, mbox, link).


Message #28 received at 77407@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: Andreas Enge <andreas@enge.fr>, 77407@debbugs.gnu.org
Subject: Re: bug#77407: shepherd hangs at "herd restart nginx"
Date: Thu, 10 Apr 2025 15:10:40 +0200
Christopher Baines <mail@cbaines.net> skribis:

>>> I've reconfigured one of the machines I was struggling with this issue
>>> on with shepherd commit 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 and I
>>> can no longer reproduce the issue.
>>
>> Did you try hard enough? :-)  As in:
>>
>>   while herd restart nginx; do : ; done
>
> I did, and previously it happened almost every time I restarted nginx.

Phew, that’s a relief.

I’ll tag 1.0.4 hopefully next week.

Thanks,
Ludo’.




Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Tue Apr 15 20:06:05 2025; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.