GNU bug report logs

#76315 System does not boot after switching to system-log service

PackageSource(s)Maintainer(s)
guix PTS Buildd Popcon
Reply or subscribe to this bug. View this bug as an mbox, status mbox, or maintainer mbox

Report forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 00:43:02 GMT) (full text, mbox, link).


Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org. (Sun, 16 Feb 2025 00:43:02 GMT) (full text, mbox, link).


Message #5 received at submit@debbugs.gnu.org (full text, mbox, reply):

From: Tomas Volf <~@wolfsden.cz>
To: bug-guix@gnu.org
Subject: System does not boot after switching to system-log service
Date: Sun, 16 Feb 2025 01:41:50 +0100
[Message part 1 (text/plain, inline)]
Hello,

after pulling recent Guix, I got this error during guix deploy:

--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>> 
--8<---------------cut here---------------end--------------->8---

After rebooting, the system got stack during startup.  No error message
was visible, it was just hanging.

Booting to previous generation did work.

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 14:24:02 GMT) (full text, mbox, link).


Message #8 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Tomas Volf <~@wolfsden.cz>
To: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Sun, 16 Feb 2025 15:23:12 +0100
[Message part 1 (text/plain, inline)]
I have put together a reproducer in a VM:

1. Install Guix system using 1.4.0 installer
  --> Include sshd, openbox

2. Reboot
3. Copy the /run/current-system/configuration.scm out of the VM
4. Adjust the configuration.scm (full file attached)
4.1 Allow NOPASSWD sudo
  (sudoers-file
   (plain-file "sudoers"
               (string-append (plain-file-content %sudoers-specification)
                              (format #f "x ALL = NOPASSWD: ALL~%"))))
4.2 Use %base-services, delete set-xorg-configuration service
4.3 Add dhcp-client-service-type service.
4.4 Authorize your key
  (simple-service
   'extra-authorized-keys guix-service-type
   (guix-extension
    (authorized-keys (list
                      (local-file "/etc/guix/signing-key.pub")))))

5. Manually tweak /etc/sudoers to support NOPASSWD for user x
6. Create machine configuration (full file attached)

7. Guix deploy the machine using b99df83c591104655a6b387817d8f7bb3c50204c
8. Reboot

9. Guix deploy the machine using 1afbf48b250f667ce45de40a6c275e3e42ade67c
  --> See the following error:
  
--8<---------------cut here---------------start------------->8---
building path(s) `/gnu/store/zdknxv3knkkxx52nwfbz120p32z4j2aa-upgrade-shepherd-services.scm'
building path(s) `/gnu/store/x7bzglpc0vvr5ak24k3i33ikq5ph8sfx-remote-exp.scm'
guix deploy: warning: an error occurred while upgrading services on 'localhost':
%exception #<inferior-object #<&service-not-found-error service: system-log>> 
--8<---------------cut here---------------end--------------->8---

A. Reboot
  --> The system does not come up (I gave it ~10 minutes).

[config.scm (text/x-scheme, inline)]
;; This is an operating system configuration generated
;; by the graphical installer.
;;
;; Once installation is complete, you can learn and modify
;; this file to tweak the system configuration, and pass it
;; to the 'guix system reconfigure' command to effect your
;; changes.


;; Indicate which modules to import to access the variables
;; used in this configuration.
(use-modules (gnu))
(use-service-modules cups desktop networking ssh xorg)

(operating-system
  (locale "en_US.utf8")
  (timezone "Europe/Prague")
  (keyboard-layout (keyboard-layout "us"))
  (host-name "x")

  ;; The list of user accounts ('root' is implicit).
  (users (cons* (user-account
                 (name "x")
                 (comment "X")
                 (group "users")
                 (home-directory "/home/x")
                 (supplementary-groups '("wheel" "netdev" "audio" "video")))
                %base-user-accounts))

  ;; Packages installed system-wide.  Users can also install packages
  ;; under their own account: use 'guix search KEYWORD' to search
  ;; for packages and 'guix install PACKAGE' to install a package.
  (packages (append (list (specification->package "openbox")
                          (specification->package "nss-certs"))
                    %base-packages))

  (sudoers-file
   (plain-file "sudoers"
               (string-append (plain-file-content %sudoers-specification)
                              (format #f "x ALL = NOPASSWD: ALL~%"))))

  ;; Below is the list of system services.  To search for available
  ;; services, run 'guix system search KEYWORD' in a terminal.
  (services
   (append (list
            (service dhcp-client-service-type)
            ;; To configure OpenSSH, pass an 'openssh-configuration'
            ;; record as a second argument to 'service' below.
            (service openssh-service-type)

            (simple-service
             'extra-authorized-keys guix-service-type
             (guix-extension
              (authorized-keys (list
                                (local-file "/etc/guix/signing-key.pub"))))))

           ;; This is the default list of services we
           ;; are appending to.
           %base-services))
  (bootloader (bootloader-configuration
               (bootloader grub-efi-bootloader)
               (targets (list "/boot/efi"))
               (keyboard-layout keyboard-layout)))
  (swap-devices (list (swap-space
                        (target (uuid
                                 "aa8dee07-5bf4-4ad2-8db7-8ee6139d6fc5")))))

  ;; The list of file systems that get "mounted".  The unique
  ;; file system identifiers there ("UUIDs") can be obtained
  ;; by running 'blkid' in a terminal.
  (file-systems (cons* (file-system
                         (mount-point "/boot/efi")
                         (device (uuid "79EB-4D57"
                                       'fat32))
                         (type "vfat"))
                       (file-system
                         (mount-point "/")
                         (device (uuid
                                  "11d0a98d-7200-4a9b-ae0a-0cb4db3e808d"
                                  'ext4))
                         (type "ext4")) %base-file-systems)))
[machine.scm (text/x-scheme, inline)]
(use-modules (gnu))

(use-service-modules networking ssh)
(use-package-modules bootloaders)

(list (machine
       (operating-system (primitive-load "config.scm"))
       (environment managed-host-environment-type)
       (configuration (machine-ssh-configuration
                       (build-locally? #f)
                       (host-name "localhost")
                       (system "x86_64-linux")
                       (user "x")
                       (port 8888)))))
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@gnu.org> to control@debbugs.gnu.org. (Sun, 16 Feb 2025 17:47:02 GMT) (full text, mbox, link).


Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 21:31:01 GMT) (full text, mbox, link).


Message #13 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Sun, 16 Feb 2025 22:30:18 +0100
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

> A. Reboot
>   --> The system does not come up (I gave it ~10 minutes).

I tried the config file you gave with:

  ./pre-inst-env guix system vm /tmp/config.scm

and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
since June, and “make check-system TESTS=basic” & co. pass).

I’ll keep investigating and probably revert the change in the interim.

Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Sun, 16 Feb 2025 22:22:02 GMT) (full text, mbox, link).


Message #16 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Sun, 16 Feb 2025 23:20:57 +0100
Ludovic Courtès <ludo@gnu.org> skribis:

> I’ll keep investigating and probably revert the change in the interim.

Reverted in 8c483c12e94bcf43e4c44170f1d5fea5fbba4970.

Ludo'.




Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Wed, 19 Feb 2025 21:05:02 GMT) (full text, mbox, link).


Message #19 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Wed, 19 Feb 2025 22:04:20 +0100
Hey Tomas,

Ludovic Courtès <ludo@gnu.org> skribis:

> I tried the config file you gave with:
>
>   ./pre-inst-env guix system vm /tmp/config.scm
>
> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
> since June, and “make check-system TESTS=basic” & co. pass).

After spending hours on this and fixing improbable issues in the
Shepherd (will push shortly), I found that the root of the problem is
exactly what I feared and which led to the patches at
<https://issues.guix.gnu.org/76262>.

Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
it loses the race and waits forever.  (I’m using
‘network-manager-service-type’ on my laptop, which is why I did not
stumble upon this bug.)

Could you try your config with the patch at
<https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
the metal?

Thanks in advance,
Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Wed, 19 Feb 2025 21:09:02 GMT) (full text, mbox, link).


Message #22 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Wed, 19 Feb 2025 22:07:53 +0100
Ludovic Courtès <ludo@gnu.org> skribis:

> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

You need to do that on top of a pre-revert commit, such as
eba8c08b1bfc7ac333a0eda658a0be5acac7f151.




Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Thu, 20 Feb 2025 21:33:02 GMT) (full text, mbox, link).


Message #25 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Thu, 20 Feb 2025 22:32:03 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo@gnu.org> writes:

> Hey Tomas,
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>>   ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.

Observation here.  While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.

At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.

So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case.  Not sure where I am going with this, I
just think it is interesting.

>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

I have reverted your revert and applied the patch 2 on top of that.

Steps I took (both in VM and on a spare laptop):

1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).

I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.

Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:

--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---

The system comes up fine after reboot though.

>
> Thanks in advance,
> Ludo’.

Thank you for figuring this one out. :)

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Fri, 21 Feb 2025 11:18:01 GMT) (full text, mbox, link).


Message #28 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Fri, 21 Feb 2025 12:17:16 +0100
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

>> After spending hours on this and fixing improbable issues in the
>> Shepherd (will push shortly), I found that the root of the problem is
>> exactly what I feared and which led to the patches at
>> <https://issues.guix.gnu.org/76262>.
>>
>> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
>> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
>> it loses the race and waits forever.
>
> Observation here.  While yes, based on the description I agree that it
> is (bad) luck based, in practice it seems to be extremely reliable to
> reproduce.

Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’.  Thing is,
as soon as you would change something non-trivial, for instance the
‘message-destination’ procedure of shepherd so that it writes everything
to /dev/console, the problem would go away.  Even just commenting out
some of the parameters passed to ‘system-log’ could make the problem
disappear (!), which is why it took me a lot of time to figure it out.

>> Could you try your config with the patch at
>> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
>> the metal?

[...]

> I can confirm the patch 2 fixes the issue for me, both in the VM and on
> physical machine.

Yay!

> Only thing I have noticed that even when deploying the "good" commit, I
> see the following error in the log:
>
> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
> %exception #<inferior-object #<&service-not-found-error service: system-log>>

I think I understood this one now.

The old service has only one name: syslogd.  The new one, which upgrades
it, has two names: system-log and syslogd (system-log is its “canonical
name”).

The service upgrade machinery gets confused because it uses the
canonical name in one place.

I’ll investigate.

Ludo’.




Information forwarded to bug-guix@gnu.org:
bug#76315; Package guix. (Sun, 23 Feb 2025 14:50:02 GMT) (full text, mbox, link).


Message #31 received at 76315@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Tomas Volf <~@wolfsden.cz>
Cc: 76315@debbugs.gnu.org
Subject: Re: bug#76315: System does not boot after switching to system-log service
Date: Sun, 23 Feb 2025 15:49:44 +0100
Ludovic Courtès <ludo@gnu.org> skribis:

>> Only thing I have noticed that even when deploying the "good" commit, I
>> see the following error in the log:
>>
>> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
>> %exception #<inferior-object #<&service-not-found-error service: system-log>>
>
> I think I understood this one now.

Patch 👉 https://issues.guix.gnu.org/76502




bug closed, send any further explanations to 76315@debbugs.gnu.org and Tomas Volf <~@wolfsden.cz> Request was from Ludovic Courtès <ludo@gnu.org> to control@debbugs.gnu.org. (Mon, 07 Apr 2025 14:25:02 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Sat Apr 19 07:26:33 2025; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.