Hello Ludo,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (16 lines)
> Hi,
>
> Rutherther <rutherther@ditigal.xyz> skribis:
>
>> I was testing on commmit Ian sent to reproduce, and now moved to
>> newest guix, seems to have been solved (at least the specific error I
>> was getting),
>> possibly by shepherd 1.0.3 update.
>>
>> With newest guix both the first reconfigure of guix system image
>> and subsequent ones are fine.
>
> I believe I’m still experiencing it on my laptop (but it’s harmless, I
> only briefly see e2fsck saying “recovering journal” at boot time), but
> of course, not the slightest clue in /var/log/messages.
To make it more clear, what I was able to get in the VM was full disk
recovery that was clearing wrong inodes, every time and I even got disk
corruptions for the files produced during reconfigure (which was the
last thing I did before reboot). So I am not sure if this is the same
thing, do you know, is this one line printed only when there is
something wrong with the journal, or is it printed every time like sort
of a 'welcome' message?
Toggle quote (16 lines)
>
>> Let me know if you're still experiencing this issue after updating
>> and I might try harder to reproduce if I got a different issue on first
>> try. I am afraid this will be hard to debug on real hw as you don't
>> really get the log in /var/log/messages for shutting down the system, I
>> had to get it through a serial line via stdout of qemu.
>>
>> Also I was able to reproduce the issue on the older
>> guix just by running the shepherd services upgrade scm script,
>> no need for full reconfigure, this shows that something has gone
>> wrong when shepherd was reloading the services. Do we have some kind of
>> a test for this in guix / shepherd so it can't happen anymore in the future?
>
> There’s a system test, “root-unmount” in (gnu tests base), but it
> succeeds: <https://ci.guix.gnu.org/build/9790935/details>.
Thanks for pointing out this test, good to know about it, although it's
not exactly what I had in mind. I was able to reproduce previously by
running the upgrade-shepherd-services.scm that is ran upon reconfigure,
without it, root fs unmounted cleanly. So the test I had in mind would
be to test that when shepherd is upgraded like this, there aren't
changes to how the services are stopped.
Toggle quote (5 lines)
>
> Perhaps the problem only shows up with more complex system configs?
> My root partition is on a LUKS device, but I think the problem is more
> something like EBUSY upon ‘umount’ due to stale processes.
I've tried reproducing with bjoli's config last friday, they were
experiencing it and sent their config to IRC chat. I was unable to
reproduce (though I have to confess I changed the fs to ext4 for my
convenience).
So I am not sure if it is reproducible even given a config (though maybe
something is relevant on the vm vs real machine boot, but I wouldn't
expect it...)
I have yet to try Ian's config, but it's going to take me some time to
get to it as their config is more complicated, mainly the network disk,
which I would like to not skip any disk related stuff (and I think Ian
is missing file-systems requirement on their shepherd autofs service,
but it shouldn't really cause this issue as root filesystem unmount
should be recursive).
Regards,
Rutherther