Filesystems not unmounted on reboot

  • Open
  • quality assurance status badge
Details
6 participants
  • Felix Lechner
  • Ian Eure
  • Leo Famulari
  • Ludovic Courtès
  • Ricardo Wurmus
  • Rutherther
Owner
unassigned
Submitted by
Ian Eure
Severity
grave

Debbugs page

I
I
Ian Eure wrote on 17 Mar 14:09 -0700
(address . bug-guix@gnu.org)
878qp39xec.fsf@retrospec.tv
Starting recently (last 2-3 weeks), my Guix System machines have
failed to unmount filesystems when restarted. ex. inside an Emacs
EXWM X11 session, running `sudo reboot' in a shell causes the
system
to perform a lengthy fsck on next boot, saying "recovering
journal."

Others on IRC have noticed this as well. Some think it’s
correlated
to running `sudo guix system reconfigure', but I observe it
whether
I’ve reconfigured or not.

I’m using LUKS1 whole-disk encryption.

Here’s the `guix system describe' output from a laptop (ThinkPad
X280)
which is exhibiting the problem:

Generation 46 Mar 17 2025 07:22:11 (current)
file name: /var/guix/profiles/system-46-link
canonical file name:
/gnu/store/zjk9w5jwdrlsc6kr8s4iq3317ys17shf-system
label: GNU with Linux 6.13.7
bootloader: grub-efi
root device: /dev/mapper/cryptroot
kernel:
/gnu/store/q8vjcaaykh0p7769xk1rz29di1axry07-linux-6.13.7/bzImage
channels:
guix:
repository URL:
branch: master
commit: 98be320183579b3d09cf4059e86a9781485628b4
nonguix:
branch: master
commit: fa416ebdf9e4d5c3b9676ded8829c5875bcf4f0e
atomized:
repository URL:
branch: main
commit: bd4ef7fad637b7213e1a96885b61a6ca471eeb0b
configuration file:
/gnu/store/dx2hzsqh47iz2pbiwiqsd8w7lpc6f4jy-configuration.scm

-- Ian
R
R
Ricardo Wurmus wrote on 17 Mar 22:43 -0700
(address . 77086@debbugs.gnu.org)
878qp26gih.fsf@elephly.net
I have also seen this on a netbook where I have never used "guix
system reconfigure"; I only ever use "guix deploy" from another
machine to upgrade that system.

This is the version of Guix used:

guix 9212459
branch: master
commit: 92124591eedf27e988c84f75acd4b4d99ff43122
rosenthal 5172fac
branch: trunk
commit: 5172fac369025bc98cbfb925b728b2cae20ac2c5

The kernel is linux-libre-lts.

--
Ricardo
F
F
Felix Lechner wrote on 20 Mar 13:48 -0700
Can lead to data loss
(name . GNU bug tracker automated control server)(address . control@debbugs.gnu.org)
87sen7sa29.fsf@lease-up.com
severity 77086 grave
thanks
R
R
Rutherther wrote on 21 Mar 06:43 -0700
Re: Filesystems not unmounted on reboot
(address . 77086@debbugs.gnu.org)
87frj6xzvt.fsf@ditigal.xyz
Hello guys,

so I started looking into this a bit (not promising any results though),
and I can pretty confidently say that there is indeed an issue in the
unmount.

I have created a VM system, tried rebooting a few times and it was fine,
however then I tried reconfiguring, and for that run I got an error upon
reboot. Not only that, I can't boot it anymore :) the filesystem got
corrupted in a way that prevents boots. Welp.

I can't really say at current moment what is causing this, but the
problem is that the root device is busy so it can't be unmounted.
I have a hypothesis given the log I see: that the root filesystem
is being unmounted first rather than last like it should be.
Could the reconfigure throw shepherd off? I am also CCing Ludovic,
I hope he won't mind.

I would like to also point out an e-mail in the
guix-devel sent recently that someone got a timer service running after
reconfigure, but not after reboot, where after reboot the timer module
is not imported by default unless put to service's modules, but after
reconfigure it works, so this leaves me with yet another point for the
impression that shepherd behaves differently on reboot as opposed to
reconfigure reload.

I will try digging more, but I am not that knowledgeable about
shepherd yet, so it will take longer time.

I am attaching both log (starting after reboot command)
and the configuration used for the vm.
Attachment: reboot_log
;; -*- mode: scheme; -*- ;; This is an operating system configuration for a VM image. ;; Modify it as you see fit and instantiate the changes by running: ;; ;; guix system reconfigure /etc/config.scm ;; (use-modules (gnu) (guix) (srfi srfi-1)) (use-service-modules desktop mcron networking spice ssh xorg sddm) (use-package-modules bootloaders fonts package-management xdisorg xorg) (define vm-image-motd (plain-file "motd" " \x1b[1;37mThis is the GNU system. Welcome!\x1b[0m This instance of Guix is a template for virtualized environments. You can reconfigure the whole system by adjusting /etc/config.scm and running: guix system reconfigure /etc/config.scm Run '\x1b[1;37minfo guix\x1b[0m' to browse documentation. \x1b[1;33mConsider setting a password for the 'root' and 'guest' \ accounts.\x1b[0m ")) (operating-system (host-name "gnu") (timezone "Etc/UTC") (locale "en_US.utf8") (keyboard-layout (keyboard-layout "us" "altgr-intl")) ;; Label for the GRUB boot menu. (label (string-append "GNU Guix " (or (getenv "GUIX_DISPLAYED_VERSION") (package-version guix)))) (firmware '()) (kernel-arguments (list "console=ttyS0" "console=tty0")) ;; Below we assume /dev/vda is the VM's hard disk. ;; Adjust as needed. (bootloader (bootloader-configuration (bootloader grub-bootloader) (targets '("/dev/vda")) (terminal-outputs '(console)))) (file-systems (cons (file-system (mount-point "/") (device "/dev/vda1") (type "ext4")) %base-file-systems)) (users (cons (user-account (name "guest") (comment "GNU Guix Live") (password "") ;no password (group "users") (supplementary-groups '("wheel" "netdev" "audio" "video"))) %base-user-accounts)) ;; Our /etc/sudoers file. Since 'guest' initially has an empty password, ;; allow for password-less sudo. (sudoers-file (plain-file "sudoers" "\ root ALL=(ALL) ALL %wheel ALL=NOPASSWD: ALL\n")) (packages (append (list font-bitstream-vera ;; Auto-started script providing SPICE dynamic resizing for ;; Xfce (see: ;; https://gitlab.xfce.org/xfce/xfce4-settings/-/issues/142). x-resize) %base-packages)) (services (append (list ;; Uncomment the line below to add an SSH server. (service openssh-service-type) ;; Add support for the SPICE protocol, which enables dynamic ;; resizing of the guest screen resolution, clipboard ;; integration with the host, etc. (service spice-vdagent-service-type) ;; Use the DHCP client service rather than NetworkManager. (service dhcp-client-service-type)) ;; Remove some services that don't make sense in a VM. (remove (lambda (service) (let ((type (service-kind service))) (or (memq type (list gdm-service-type sddm-service-type wpa-supplicant-service-type cups-pk-helper-service-type network-manager-service-type modem-manager-service-type)) (eq? 'network-manager-applet (service-type-name type))))) (modify-services %desktop-services (login-service-type config => (login-configuration (inherit config) (motd vm-image-motd))) ;; Install and run the current Guix rather than an older ;; snapshot. (guix-service-type config => (guix-configuration (inherit config) (guix (current-guix)))))))) ;; Allow resolution of '.local' host names with mDNS. (name-service-switch %mdns-host-lookup-nss))
Regards,
Rutherther
R
R
Rutherther wrote on 21 Mar 09:50 -0700
(address . 77086@debbugs.gnu.org)
87tt7mtjis.fsf@ditigal.xyz
Hello guys,

I am writing with an update, because not everything I wrote in the first
e-mail was true exactly. Please let me know if updates distrub you and I
won't be CCing you again.

"Rutherther" <rutherther@ditigal.xyz> writes:

Toggle quote (6 lines)
>
> I have created a VM system, tried rebooting a few times and it was fine,
> however then I tried reconfiguring, and for that run I got an error upon
> reboot. Not only that, I can't boot it anymore :) the filesystem got
> corrupted in a way that prevents boots. Welp.

This wasn't the case, I just haven't realized the correct partition to
boot from is /dev/vda2, not /dev/vda1, I copied the image from guix repo
and haven't realized it's wrong (I suppose it haven't used efi before,
but switched to it?). Since the images get the FS replaced by custom
partitions, this error was visible only after reconfigure. After
changing it to /dev/vda2 I am able to boot, but fsck is ran to repair
damage done.

Toggle quote (5 lines)
>
> I am attaching both log (starting after reboot command)
> and the configuration used for the vm.
>

Since this was the first reconfigure after the system obtained by guix
system image, the log isn't exactly accurate to what would happen on
subsequent reconfigures.

The previous time the error was that / is busy. Maybe it was because of a
change in the filesystem services from the initial image to custom
config with `file-systems`
While that is definitely an issue as well, it might not be completely
related to the issue reported here by others. (imo there should be a way
for cases where the user is making big changes like changing file
systems to tell reconfigure to not apply it to the running system, to
only apply the bootloader and switch only after boot)

I am attaching log of reboot/halt after subsequent reconfigures where
the issue is manifested a bit differently. It is not / that would be
busy, but /run/user that is busy. I still think it could be because of
wrong order of stopping services, /run/user/0 is unmounted last,
but root file system tries to be unmounted prior to that.

Note that this doesn't happen without a reconfigure, I get this behavior
only after a reconfigure. The reconfigure can happen with no changes to
the config nor guix instance ran. I get this behavior consistently
on every reconfigure ran!

[ 202.675565] shepherd[1]: Ignoring error while stopping root-file-system: (system-error "umount" "~S: ~A" ("/run/user" "Device or resource busy") (16))

Regards,
and apologies for two messages when it could've been one if I paid
more attention.
Rutherther
Attachment: reboot_log
R
R
Rutherther wrote on 21 Mar 11:57 -0700
(address . 77086@debbugs.gnu.org)
87o6xutdno.fsf@ditigal.xyz
Hello,

I was testing on commmit Ian sent to reproduce, and now moved to
newest guix, seems to have been solved (at least the specific error I
was getting),
possibly by shepherd 1.0.3 update.

With newest guix both the first reconfigure of guix system image
and subsequent ones are fine.

Let me know if you're still experiencing this issue after updating
and I might try harder to reproduce if I got a different issue on first
try. I am afraid this will be hard to debug on real hw as you don't
really get the log in /var/log/messages for shutting down the system, I
had to get it through a serial line via stdout of qemu.

Also I was able to reproduce the issue on the older
guix just by running the shepherd services upgrade scm script,
no need for full reconfigure, this shows that something has gone
wrong when shepherd was reloading the services. Do we have some kind of
a test for this in guix / shepherd so it can't happen anymore in the future?
L
L
Leo Famulari wrote on 21 Mar 21:50 -0700
Re: bug#77086: Filesystems not unmounted on reboot
(name . Rutherther via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 77086@debbugs.gnu.org)
Z95BoaNBncxYUTmI@jasmine.lan
On Fri, Mar 21, 2025 at 05:50:51PM +0100, Rutherther via Bug reports for GNU Guix wrote:
Toggle quote (4 lines)
> I am writing with an update, because not everything I wrote in the first
> e-mail was true exactly. Please let me know if updates distrub you and I
> won't be CCing you again.

Please don't hesitate to send these kinds of investigations again! It's
helpful.
I
I
Ian Eure wrote on 28 Mar 08:40 -0700
Re: Filesystems not unmounted on reboot
(address . 77086@debbugs.gnu.org)
87v7rt88in.fsf@retrospec.tv
I’m still seeing this problem on at least one machine.

Here’s the system configuration: https://paste.debian.net/1365931/

It depends on my personal channel:

I captured the state of the machine just now, then ran `sync' and
`sudo reboot'. It fsck’d on boot.

Script started on 2025-03-28 08:32:12-07:00 [TERM="dumb"
TTY="/dev/pts/1" COLUMNS="191" LINES="39"]
isot0pe!ieure:~$ w
08:32:13 up 2 days, 19:43, 2 users, load average: 1.23, 0.34,
0.12
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
ieure seat0 Tue13 0.00s 0.00s 0.03s
/gnu/store/wpdr1pfczc5a5yn7ii80qjgcgdv42jr2-gdm-46.2/libexec/gdm-x-session
--register-session --run-script
/gnu/store/mmw3bf5wachcd96i84770yzf
ieure :0 Tue13 ?xdm? 1:59 0.03s
/gnu/store/wpdr1pfczc5a5yn7ii80qjgcgdv42jr2-gdm-46.2/libexec/gdm-x-session
--register-session --run-script
/gnu/store/mmw3bf5wachcd96i84770yzf
isot0pe!ieure:~$ guix describe
Generation 62 Mar 25 2025 11:59:36 (current)
guix dbef60e
branch: master
commit: dbef60edb356246855ad6749936ee511fc1a9b4b
atomized f6fba0e
branch: main
commit: f6fba0e2b207b69ab093f2be47b306242c748ab8
nonguix a96e245
branch: master
commit: a96e2451bda5aaf9b48339edee392c6a3017d730
isot0pe!ieure:~$ guix system describe
Generation 49 Mar 25 2025 12:05:36 (current)
file name: /var/guix/profiles/system-49-link
canonical file name:
/gnu/store/wkzxc2nbiydwr42v8a0702xl128n65b0-system
label: GNU with Linux 6.13.7
bootloader: grub-efi
root device: /dev/mapper/cryptroot
kernel:
/gnu/store/5mac890q3yhfi6pvxdl8mmpyngmra5s0-linux-6.13.7/bzImage
channels:
guix:
branch: master
commit: dbef60edb356246855ad6749936ee511fc1a9b4b
atomized:
branch: main
commit: f6fba0e2b207b69ab093f2be47b306242c748ab8
nonguix:
branch: master
commit: a96e2451bda5aaf9b48339edee392c6a3017d730
configuration file:
/gnu/store/dhmn4825hvfvkbfbvb7xg6rkhdcj38ii-configuration.scm
isot0pe!ieure:~$ exit

Script done on 2025-03-28 08:32:30-07:00 [COMMAND_EXIT_CODE="0"]

-- Ian
I
I
Ian Eure wrote on 29 Mar 11:10 -0700
(name . Rutherther)(address . rutherther@ditigal.xyz)
87ecyfd7y0.fsf@retrospec.tv
Hi all,

Here’s a less dead link to the config for my system exhibiting the

I set it to never expire this time.

The FS stuff is very pedestrian, it’s more or less what the
graphical installer generates, since that’s how I installed that
system.

Thanks,
-- Ian
L
L
Ludovic Courtès wrote on 1 Apr 00:25 -0700
(name . Rutherther)(address . rutherther@ditigal.xyz)
87v7rogx7u.fsf@gnu.org
Hi,

Rutherther <rutherther@ditigal.xyz> skribis:

Toggle quote (8 lines)
> I was testing on commmit Ian sent to reproduce, and now moved to
> newest guix, seems to have been solved (at least the specific error I
> was getting),
> possibly by shepherd 1.0.3 update.
>
> With newest guix both the first reconfigure of guix system image
> and subsequent ones are fine.

I believe I’m still experiencing it on my laptop (but it’s harmless, I
only briefly see e2fsck saying “recovering journal” at boot time), but
of course, not the slightest clue in /var/log/messages.

Toggle quote (12 lines)
> Let me know if you're still experiencing this issue after updating
> and I might try harder to reproduce if I got a different issue on first
> try. I am afraid this will be hard to debug on real hw as you don't
> really get the log in /var/log/messages for shutting down the system, I
> had to get it through a serial line via stdout of qemu.
>
> Also I was able to reproduce the issue on the older
> guix just by running the shepherd services upgrade scm script,
> no need for full reconfigure, this shows that something has gone
> wrong when shepherd was reloading the services. Do we have some kind of
> a test for this in guix / shepherd so it can't happen anymore in the future?

There’s a system test, “root-unmount” in (gnu tests base), but it

Perhaps the problem only shows up with more complex system configs?
My root partition is on a LUKS device, but I think the problem is more
something like EBUSY upon ‘umount’ due to stale processes.

Thanks,
Ludo’.
R
R
Rutherther wrote on 2 Apr 08:31 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)
87cydua8bk.fsf@ditigal.xyz
Hello Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (16 lines)
> Hi,
>
> Rutherther <rutherther@ditigal.xyz> skribis:
>
>> I was testing on commmit Ian sent to reproduce, and now moved to
>> newest guix, seems to have been solved (at least the specific error I
>> was getting),
>> possibly by shepherd 1.0.3 update.
>>
>> With newest guix both the first reconfigure of guix system image
>> and subsequent ones are fine.
>
> I believe I’m still experiencing it on my laptop (but it’s harmless, I
> only briefly see e2fsck saying “recovering journal” at boot time), but
> of course, not the slightest clue in /var/log/messages.

To make it more clear, what I was able to get in the VM was full disk
recovery that was clearing wrong inodes, every time and I even got disk
corruptions for the files produced during reconfigure (which was the
last thing I did before reboot). So I am not sure if this is the same
thing, do you know, is this one line printed only when there is
something wrong with the journal, or is it printed every time like sort
of a 'welcome' message?

Toggle quote (16 lines)
>
>> Let me know if you're still experiencing this issue after updating
>> and I might try harder to reproduce if I got a different issue on first
>> try. I am afraid this will be hard to debug on real hw as you don't
>> really get the log in /var/log/messages for shutting down the system, I
>> had to get it through a serial line via stdout of qemu.
>>
>> Also I was able to reproduce the issue on the older
>> guix just by running the shepherd services upgrade scm script,
>> no need for full reconfigure, this shows that something has gone
>> wrong when shepherd was reloading the services. Do we have some kind of
>> a test for this in guix / shepherd so it can't happen anymore in the future?
>
> There’s a system test, “root-unmount” in (gnu tests base), but it
> succeeds: <https://ci.guix.gnu.org/build/9790935/details>.

Thanks for pointing out this test, good to know about it, although it's
not exactly what I had in mind. I was able to reproduce previously by
running the upgrade-shepherd-services.scm that is ran upon reconfigure,
without it, root fs unmounted cleanly. So the test I had in mind would
be to test that when shepherd is upgraded like this, there aren't
changes to how the services are stopped.

Toggle quote (5 lines)
>
> Perhaps the problem only shows up with more complex system configs?
> My root partition is on a LUKS device, but I think the problem is more
> something like EBUSY upon ‘umount’ due to stale processes.

I've tried reproducing with bjoli's config last friday, they were
experiencing it and sent their config to IRC chat. I was unable to
reproduce (though I have to confess I changed the fs to ext4 for my
convenience).
So I am not sure if it is reproducible even given a config (though maybe
something is relevant on the vm vs real machine boot, but I wouldn't
expect it...)

I have yet to try Ian's config, but it's going to take me some time to
get to it as their config is more complicated, mainly the network disk,
which I would like to not skip any disk related stuff (and I think Ian
is missing file-systems requirement on their shepherd autofs service,
but it shouldn't really cause this issue as root filesystem unmount
should be recursive).

Regards,
Rutherther
L
L
Ludovic Courtès wrote on 3 Apr 02:19 -0700
(name . Rutherther)(address . rutherther@ditigal.xyz)
87wmc17gb2.fsf@gnu.org
Hello,

Rutherther <rutherther@ditigal.xyz> skribis:

Toggle quote (10 lines)
>> There’s a system test, “root-unmount” in (gnu tests base), but it
>> succeeds: <https://ci.guix.gnu.org/build/9790935/details>.
>
> Thanks for pointing out this test, good to know about it, although it's
> not exactly what I had in mind. I was able to reproduce previously by
> running the upgrade-shepherd-services.scm that is ran upon reconfigure,
> without it, root fs unmounted cleanly. So the test I had in mind would
> be to test that when shepherd is upgraded like this, there aren't
> changes to how the services are stopped.

I’m not sure I understand: do you have a reproducer in a VM, either
manual or automatic? If you do, please share; if it’s manual, you can
share the steps you followed.

Thanks in advance! :-)

Ludo’.
R
R
Rutherther wrote on 3 Apr 02:37 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)
CD27AB3E-5B37-4C82-A0E2-D1A2E8E6C16D@ditigal.xyz
Hello, yes, I shared a (manual) reproducer earlier in this thread. But as I was saying, it was only for commits from Ian's initial email. I was unable to reproduce with same steps after updating to newest (which among other things has shepherd 1.0.3, but I havent bisected to see what commit has fixed the case I got into). Unfortunately people still seem to experience it, so I must have seen other issue / just one possible cause of this one when there are several. I havent sent the final config, one change to it I shared was just mentioned in text. I can send the final config later. I was able to get error on unmounting every time I run reconfigure or just the upgrade shepherd services script.


On April 3, 2025 11:19:45 AM GMT+02:00, "Ludovic Courtès" <ludo@gnu.org> wrote:
Toggle quote (21 lines)
>Hello,
>
>Rutherther <rutherther@ditigal.xyz> skribis:
>
>>> There’s a system test, “root-unmount” in (gnu tests base), but it
>>> succeeds: <https://ci.guix.gnu.org/build/9790935/details>.
>>
>> Thanks for pointing out this test, good to know about it, although it's
>> not exactly what I had in mind. I was able to reproduce previously by
>> running the upgrade-shepherd-services.scm that is ran upon reconfigure,
>> without it, root fs unmounted cleanly. So the test I had in mind would
>> be to test that when shepherd is upgraded like this, there aren't
>> changes to how the services are stopped.
>
>I’m not sure I understand: do you have a reproducer in a VM, either
>manual or automatic? If you do, please share; if it’s manual, you can
>share the steps you followed.
>
>Thanks in advance! :-)
>
>Ludo’.
F
F
Felix Lechner wrote on 3 Apr 06:25 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)
87ecy9e5qw.fsf@lease-up.com
Hi,

On Tue, Apr 01 2025, Ludovic Courtès wrote:

Toggle quote (2 lines)
> I believe I’m still experiencing it on my laptop

Unmounting properly is great, but would it also make sense to add a call
to sync(2) after a reconfigure since some people reboot quickly?

Kind regards
Felix
L
L
Ludovic Courtès wrote 5 days ago
(name . Felix Lechner)(address . felix.lechner@lease-up.com)
871pu37zs2.fsf@gnu.org
Felix Lechner <felix.lechner@lease-up.com> skribis:

Toggle quote (7 lines)
> On Tue, Apr 01 2025, Ludovic Courtès wrote:
>
>> I believe I’m still experiencing it on my laptop
>
> Unmounting properly is great, but would it also make sense to add a call
> to sync(2) after a reconfigure since some people reboot quickly?

The ‘root-file-system’ service calls ‘sync’ when shutting down.

Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 77086@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 77086
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch