Report forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Sun, 16 Feb 2025 00:43:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org.
(Sun, 16 Feb 2025 00:43:02 GMT) (full text, mbox, link).
Hello,
after pulling recent Guix, I got this error during guix deploy:
--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---
After rebooting, the system got stack during startup. No error message
was visible, it was just hanging.
Booting to previous generation did work.
Tomas
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
I have put together a reproducer in a VM:
1. Install Guix system using 1.4.0 installer
--> Include sshd, openbox
2. Reboot
3. Copy the /run/current-system/configuration.scm out of the VM
4. Adjust the configuration.scm (full file attached)
4.1 Allow NOPASSWD sudo
(sudoers-file
(plain-file "sudoers"
(string-append (plain-file-content %sudoers-specification)
(format #f "x ALL = NOPASSWD: ALL~%"))))
4.2 Use %base-services, delete set-xorg-configuration service
4.3 Add dhcp-client-service-type service.
4.4 Authorize your key
(simple-service
'extra-authorized-keys guix-service-type
(guix-extension
(authorized-keys (list
(local-file "/etc/guix/signing-key.pub")))))
5. Manually tweak /etc/sudoers to support NOPASSWD for user x
6. Create machine configuration (full file attached)
7. Guix deploy the machine using b99df83c591104655a6b387817d8f7bb3c50204c
8. Reboot
9. Guix deploy the machine using 1afbf48b250f667ce45de40a6c275e3e42ade67c
--> See the following error:
--8<---------------cut here---------------start------------->8---
building path(s) `/gnu/store/zdknxv3knkkxx52nwfbz120p32z4j2aa-upgrade-shepherd-services.scm'
building path(s) `/gnu/store/x7bzglpc0vvr5ak24k3i33ikq5ph8sfx-remote-exp.scm'
guix deploy: warning: an error occurred while upgrading services on 'localhost':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---
A. Reboot
--> The system does not come up (I gave it ~10 minutes).
;; This is an operating system configuration generated
;; by the graphical installer.
;;
;; Once installation is complete, you can learn and modify
;; this file to tweak the system configuration, and pass it
;; to the 'guix system reconfigure' command to effect your
;; changes.
;; Indicate which modules to import to access the variables
;; used in this configuration.
(use-modules (gnu))
(use-service-modules cups desktop networking ssh xorg)
(operating-system
(locale "en_US.utf8")
(timezone "Europe/Prague")
(keyboard-layout (keyboard-layout "us"))
(host-name "x")
;; The list of user accounts ('root' is implicit).
(users (cons* (user-account
(name "x")
(comment "X")
(group "users")
(home-directory "/home/x")
(supplementary-groups '("wheel" "netdev" "audio" "video")))
%base-user-accounts))
;; Packages installed system-wide. Users can also install packages
;; under their own account: use 'guix search KEYWORD' to search
;; for packages and 'guix install PACKAGE' to install a package.
(packages (append (list (specification->package "openbox")
(specification->package "nss-certs"))
%base-packages))
(sudoers-file
(plain-file "sudoers"
(string-append (plain-file-content %sudoers-specification)
(format #f "x ALL = NOPASSWD: ALL~%"))))
;; Below is the list of system services. To search for available
;; services, run 'guix system search KEYWORD' in a terminal.
(services
(append (list
(service dhcp-client-service-type)
;; To configure OpenSSH, pass an 'openssh-configuration'
;; record as a second argument to 'service' below.
(service openssh-service-type)
(simple-service
'extra-authorized-keys guix-service-type
(guix-extension
(authorized-keys (list
(local-file "/etc/guix/signing-key.pub"))))))
;; This is the default list of services we
;; are appending to.
%base-services))
(bootloader (bootloader-configuration
(bootloader grub-efi-bootloader)
(targets (list "/boot/efi"))
(keyboard-layout keyboard-layout)))
(swap-devices (list (swap-space
(target (uuid
"aa8dee07-5bf4-4ad2-8db7-8ee6139d6fc5")))))
;; The list of file systems that get "mounted". The unique
;; file system identifiers there ("UUIDs") can be obtained
;; by running 'blkid' in a terminal.
(file-systems (cons* (file-system
(mount-point "/boot/efi")
(device (uuid "79EB-4D57"
'fat32))
(type "vfat"))
(file-system
(mount-point "/")
(device (uuid
"11d0a98d-7200-4a9b-ae0a-0cb4db3e808d"
'ext4))
(type "ext4")) %base-file-systems)))
Severity set to 'important' from 'normal'
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Sun, 16 Feb 2025 17:47:02 GMT) (full text, mbox, link).
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Sun, 16 Feb 2025 21:31:01 GMT) (full text, mbox, link).
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Sun, 16 Feb 2025 22:30:18 +0100
Hi,
Tomas Volf <~@wolfsden.cz> skribis:
> A. Reboot
> --> The system does not come up (I gave it ~10 minutes).
I tried the config file you gave with:
./pre-inst-env guix system vm /tmp/config.scm
and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
since June, and “make check-system TESTS=basic” & co. pass).
I’ll keep investigating and probably revert the change in the interim.
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Sun, 16 Feb 2025 22:22:02 GMT) (full text, mbox, link).
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Sun, 16 Feb 2025 23:20:57 +0100
Ludovic Courtès <ludo@gnu.org> skribis:
> I’ll keep investigating and probably revert the change in the interim.
Reverted in 8c483c12e94bcf43e4c44170f1d5fea5fbba4970.
Ludo'.
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Wed, 19 Feb 2025 21:05:02 GMT) (full text, mbox, link).
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Wed, 19 Feb 2025 22:04:20 +0100
Hey Tomas,
Ludovic Courtès <ludo@gnu.org> skribis:
> I tried the config file you gave with:
>
> ./pre-inst-env guix system vm /tmp/config.scm
>
> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
> since June, and “make check-system TESTS=basic” & co. pass).
After spending hours on this and fixing improbable issues in the
Shepherd (will push shortly), I found that the root of the problem is
exactly what I feared and which led to the patches at
<https://issues.guix.gnu.org/76262>.
Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
it loses the race and waits forever. (I’m using
‘network-manager-service-type’ on my laptop, which is why I did not
stumble upon this bug.)
Could you try your config with the patch at
<https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
the metal?
Thanks in advance,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Wed, 19 Feb 2025 21:09:02 GMT) (full text, mbox, link).
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Wed, 19 Feb 2025 22:07:53 +0100
Ludovic Courtès <ludo@gnu.org> skribis:
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?
You need to do that on top of a pre-revert commit, such as
eba8c08b1bfc7ac333a0eda658a0be5acac7f151.
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Thu, 20 Feb 2025 21:33:02 GMT) (full text, mbox, link).
Ludovic Courtès <ludo@gnu.org> writes:
> Hey Tomas,
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>> ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.
Observation here. While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.
At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.
So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case. Not sure where I am going with this, I
just think it is interesting.
>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?
I have reverted your revert and applied the patch 2 on top of that.
Steps I took (both in VM and on a spare laptop):
1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).
I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.
Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:
--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---
The system comes up fine after reboot though.
>
> Thanks in advance,
> Ludo’.
Thank you for figuring this one out. :)
Tomas
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Fri, 21 Feb 2025 12:17:16 +0100
Hi,
Tomas Volf <~@wolfsden.cz> skribis:
>> After spending hours on this and fixing improbable issues in the
>> Shepherd (will push shortly), I found that the root of the problem is
>> exactly what I feared and which led to the patches at
>> <https://issues.guix.gnu.org/76262>.
>>
>> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
>> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
>> it loses the race and waits forever.
>
> Observation here. While yes, based on the description I agree that it
> is (bad) luck based, in practice it seems to be extremely reliable to
> reproduce.
Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’. Thing is,
as soon as you would change something non-trivial, for instance the
‘message-destination’ procedure of shepherd so that it writes everything
to /dev/console, the problem would go away. Even just commenting out
some of the parameters passed to ‘system-log’ could make the problem
disappear (!), which is why it took me a lot of time to figure it out.
>> Could you try your config with the patch at
>> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
>> the metal?
[...]
> I can confirm the patch 2 fixes the issue for me, both in the VM and on
> physical machine.
Yay!
> Only thing I have noticed that even when deploying the "good" commit, I
> see the following error in the log:
>
> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
> %exception #<inferior-object #<&service-not-found-error service: system-log>>
I think I understood this one now.
The old service has only one name: syslogd. The new one, which upgrades
it, has two names: system-log and syslogd (system-log is its “canonical
name”).
The service upgrade machinery gets confused because it uses the
canonical name in one place.
I’ll investigate.
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#76315; Package guix.
(Sun, 23 Feb 2025 14:50:02 GMT) (full text, mbox, link).
Subject: Re: bug#76315: System does not boot after switching to system-log
service
Date: Sun, 23 Feb 2025 15:49:44 +0100
Ludovic Courtès <ludo@gnu.org> skribis:
>> Only thing I have noticed that even when deploying the "good" commit, I
>> see the following error in the log:
>>
>> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
>> %exception #<inferior-object #<&service-not-found-error service: system-log>>
>
> I think I understood this one now.
Patch 👉 https://issues.guix.gnu.org/76502
bug closed, send any further explanations to
76315@debbugs.gnu.org and Tomas Volf <~@wolfsden.cz>
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Mon, 07 Apr 2025 14:25:02 GMT) (full text, mbox, link).
Debbugs is free software and licensed under the terms of the
GNU Public License version 2. The current version can be
obtained from https://bugs.debian.org/debbugs-source/.