[Shepherd] Timer spawns many commands on DST change (CET -> CEST)

  • Done
  • quality assurance status badge
Details
2 participants
  • Timo Wilken
  • Ludovic Courtès
Owner
unassigned
Submitted by
Timo Wilken
Severity
important
Merged with

Debbugs page

T
T
Timo Wilken wrote on 30 Mar 13:04 -0700
Shepherd: bug in timer scheduling on daylight savings time change
(address . bug-guix@gnu.org)
D8TVBYDNEVB3.2M6JZ0D96HFML@twilken.net
Hi,

Last night, my server's time changed from CET to CEST automatically, as it is
set to Europe/Berlin time.

I have a Shepherd timer set to run every night at 02:00 local time:

Toggle snippet (15 lines)
(simple-service 'disk-maintenance shepherd-root-service-type
(list (shepherd-service
(provision '(guix-gc))
(requirement '(user-processes file-systems))
(documentation "Delete unused files from Guix' store.")
(modules '((shepherd service timer)))
(start #~(make-timer-constructor
(calendar-event #:hours '(2) #:minutes '(0))
(command (list "guix" "gc" "-d" "2w"))))
(stop #~(make-timer-destructor))
(actions (list (shepherd-action
(name 'trigger) (procedure #~trigger-timer)
(documentation "Collect garbage now.")))))))

Last night, this hit the exact moment of the CET/CEST change and seems to have
confused the Shepherd a lot, to the point where it kept spamming the same
message to /var/log/messages, and used up 80% of my system's total memory,
according to htop. (This led to various things getting oom-killed, and the
system grinding to a halt until I had a look at it in the afternoon.)

First, at 01:00, the Shepherd spawned about 30 processes for this one timer:

Toggle snippet (76 lines)
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1691.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'nextcloud-cron' spawned process 1692.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for nextcloud-cron.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'gpsplot' spawned process 1693.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for gpsplot.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'dynamic-dns-ipv4.api.mythic-beasts.com' spawned process 1694.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for dynamic-dns-ipv4.api.mythic-beasts.com.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1698.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1703.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1713.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1715.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1723.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1726.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1732.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1736.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1740.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1750.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1754.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1761.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1766.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1768.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1778.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1785.
2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1791.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1806.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1811.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1818.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1822.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1829.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1838.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1848.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1851.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1862.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1872.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1876.
2025-03-30 01:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 01:00:02 localhost shepherd[1]: Timer 'guix-gc' spawned process 1883.
2025-03-30 01:00:02 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 03:00:00 localhost linux: [46479.948177] perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
2025-03-30 03:00:01 localhost linux: [47490.137060] perf: interrupt took too long (3138 > 3130), lowering kernel.perf_event_max_sample_rate to 63500
2025-03-30 03:00:01 localhost shepherd[1]: Timer 'guix-gc' spawned process 1887.
2025-03-30 03:00:01 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 03:00:02 localhost shepherd[1]: Timer 'guix-gc' spawned process 1896.
2025-03-30 03:00:02 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 03:00:03 localhost shepherd[1]: Timer 'guix-gc' spawned process 1910.
2025-03-30 03:00:03 localhost shepherd[1]: Registering new logger for guix-gc.
2025-03-30 03:00:03 localhost shepherd[1]: Timer 'guix-gc' spawned process 1915.

I then get 731(!!) lines like the following over the course of 3 seconds:

Toggle snippet (5 lines)
2025-03-30 03:00:03 localhost shepherd[1]: Waiting anew for timer 'guix-gc' (resuming from sleep state?).
[...]
2025-03-30 03:00:06 localhost shepherd[1]: Waiting anew for timer 'guix-gc' (resuming from sleep state?).

The Shepherd then gets more confused and starts spamming the same message over
and over again, but now without newlines, leading to lines like:

Toggle snippet (4 lines)
2025-03-30 03:00:06 localhost shepherd[1]: Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for
2025-03-30 03:00:06 localhost shepherd[1]: timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc' (resuming from sleep state?).Waiting anew for timer 'guix-gc'

Then there's some output from the various "guix gc" processes spawned by the
timer, which of course all step on each other's toes. The Shepherd keeps
happily writing the above lines out in large blocks.

It goes on like this until the afternoon, when the OOM kills start (probably
when I start SSH'ing in to see what's going on):

Toggle snippet (3 lines)
2025-03-30 15:21:44 localhost linux: [93869.155828] sshd-session invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0

After a bit of that, I decide to reboot the machine.

I've saved the system logs from that night - let me know if you need more details.

I'm running Shepherd 1.0.3, as packaged in Guix:

Toggle snippet (19 lines)
Generation 235 Mar 29 2025 12:07:15 (current)
file name: /var/guix/profiles/system-235-link
canonical file name: /gnu/store/p7infvmm678nysvdg3xvxlr83xxkj66j-system
label: GNU with Linux-Libre 6.13.8
bootloader: grub-efi
root device: UUID: c63af3e6-3c2b-43d2-b1e6-944f09a10e0f
kernel: /gnu/store/d7z9027yrs1vfmpvds10iqfl69cyvixn-linux-libre-6.13.8/bzImage
channels:
tw:
repository URL: https://git.twilken.net/dotfiles
branch: master
commit: 39abf96d68ce500abbd55f9bdbd68e1c63282d36
guix:
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 67f4a2feadbb6cc04a61b5a3bc7b8f4bace2a0bb
[non-free channels omitted]

Cheers,
Timo
L
L
Ludovic Courtès wrote on 31 Mar 06:09 -0700
control message for bug #77390
(address . control@debbugs.gnu.org)
8734etmjn1.fsf@gnu.org
merge 77390 77401
quit
L
L
Ludovic Courtès wrote on 31 Mar 06:17 -0700
Re: bug#77390: Shepherd: bug in timer scheduling on daylight savings time change
(name . Timo Wilken)(address . guix@twilken.net)(address . 77390@debbugs.gnu.org)
87semtl4qb.fsf@gnu.org
Hi Timo,

"Timo Wilken" <guix@twilken.net> skribis:

Toggle quote (19 lines)
> Last night, this hit the exact moment of the CET/CEST change and seems to have
> confused the Shepherd a lot, to the point where it kept spamming the same
> message to /var/log/messages, and used up 80% of my system's total memory,
> according to htop. (This led to various things getting oom-killed, and the
> system grinding to a halt until I had a look at it in the afternoon.)
>
> First, at 01:00, the Shepherd spawned about 30 processes for this one timer:
>
> --8<---------------cut here
> 2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1691.
> 2025-03-30 01:00:00 localhost shepherd[1]: Timer 'nextcloud-cron' spawned process 1692.
> 2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for nextcloud-cron.
> 2025-03-30 01:00:00 localhost shepherd[1]: Timer 'gpsplot' spawned process 1693.
> 2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for gpsplot.
> 2025-03-30 01:00:00 localhost shepherd[1]: Timer 'dynamic-dns-ipv4.api.mythic-beasts.com' spawned process 1694.
> 2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for dynamic-dns-ipv4.api.mythic-beasts.com.
> 2025-03-30 01:00:00 localhost shepherd[1]: Timer 'guix-gc' spawned process 1698.
> 2025-03-30 01:00:00 localhost shepherd[1]: Registering new logger for guix-gc.

I saw the exact same problem on berlin (aka. ci.guix.gnu.org):


but not on the other servers I co-administer; I suspect the difference
is that these other servers did not have a timer running at 01:00.

So I guess that the fix for https://issues.guix.gnu.org/75622 was
bogus somehow?

To be continued…

Ludo’.
L
L
Ludovic Courtès wrote on 31 Mar 06:17 -0700
control message for bug #77401
(address . control@debbugs.gnu.org)
87r02dl4pi.fsf@gnu.org
severity 77401 important
quit
L
L
Ludovic Courtès wrote on 31 Mar 13:28 -0700
(address . control@debbugs.gnu.org)
87jz85j664.fsf@gnu.org
retitle 77401 [Shepherd] Timer spawns many commands on DST change (CET -> CEST)
quit
L
L
Ludovic Courtès wrote on 31 Mar 14:21 -0700
Re: bug#77401: [Shepherd] Timer spawns too many commands
(name . Timo Wilken)(address . guix@twilken.net)
87o6xgj3pn.fsf@gnu.org
Hi Timo,

"Timo Wilken" <guix@twilken.net> skribis:

Toggle quote (8 lines)
> Last night, this hit the exact moment of the CET/CEST change and seems to have
> confused the Shepherd a lot, to the point where it kept spamming the same
> message to /var/log/messages, and used up 80% of my system's total memory,
> according to htop. (This led to various things getting oom-killed, and the
> system grinding to a halt until I had a look at it in the afternoon.)
>
> First, at 01:00, the Shepherd spawned about 30 processes for this one timer:

This is now fixed:


Let me know if you find something fishy.

The irony is that the work I did for https://issues.guix.gnu.org/75622
helped, but the one test that would have caught this issue was wrong: it
was looking at the wrong date when testing the CET->CEST change.

Setting #:wait-for-termination? #true would have mitigated this bug (it
wouldn’t have triggered).

Thanks,
Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 77390@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 77390
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch