Systemd Service Hang

Why and when do systemd services hang indefinitely

Finally, TIL, what can all be the reason for systemd services to hang indefinitely. The internet is flooded with numerous reports on this topic but no clear answers. So no more uselessly marked workarounds like: systemctl daemon-reload and systemctl-daemon-reexec for this scenario.

The scene would be something along the lines of:

rrs         6467  0.0  0.0  23088 15852 pts/1    Ss   12:53   0:00  |   |   \_ /bin/bash
rrs        11512  0.0  0.0  14876  4608 pts/1    S+   13:18   0:00  |   |   |   \_ systemctl restart snapper-timeline.timer
rrs        11513  0.0  0.0  14984  3076 pts/1    S+   13:18   0:00  |   |   |       \_ /bin/systemd-tty-ask-password-agent --watch
rrs        11514  0.0  0.0 234756  6752 pts/1    Sl+  13:18   0:00  |   |   |       \_ /usr/bin/pkttyagent --notify-fd 5 --fallback

The snapper-timeline service is important to me and it not running for months is a complete failure. Disappointingly, commands like systemctl --failed do not report of this oddity. The overall system status is reported to be fine, which is completely incorrect.

Thankfully, a kind soul’s comment gave the hint. The problem is that you could be having certain services in Activating status, which thus blocks all other services; quietly. So much for the unnecessary fun.

Looking further, in my case, it was:

rrs@priyasi:~$ systemctl list-jobs 
JOB  UNIT                           TYPE  STATE  
81   timers.target                  start waiting
85   man-db.timer                   start waiting
88   fstrim.timer                   start waiting
3832 snapper-timeline.service       start waiting
83   snapper-timeline.timer         start waiting
39   systemd-time-wait-sync.service start running
87   logrotate.timer                start waiting
84   debspawn-clear-caches.timer    start waiting
89   plocate-updatedb.timer         start waiting
91   dpkg-db-backup.timer           start waiting
93   e2scrub_all.timer              start waiting
40   time-sync.target               start waiting
86   apt-listbugs.timer             start waiting

13 jobs listed.
13:12 ♒ ॐ ♅ ♄ ⛢     ☺ 😄    

That was it. I knew the systemd-timesyncd service, in the past, had given me enough headaches. And so was it this time, just quietly doing it all again.

rrs@priyasi:~$ systemctl status systemd-time-wait-sync.service
● systemd-time-wait-sync.service - Wait Until Kernel Time Synchronized
     Loaded: loaded (/lib/systemd/system/systemd-time-wait-sync.service; enabled; vendor preset>
     Active: activating (start) since Fri 2022-04-22 13:14:25 IST; 1min 38s ago
       Docs: man:systemd-time-wait-sync.service(8)
   Main PID: 11090 (systemd-time-wa)
      Tasks: 1 (limit: 37051)
     Memory: 836.0K
        CPU: 7ms
     CGroup: /system.slice/systemd-time-wait-sync.service
             └─11090 /lib/systemd/systemd-time-wait-sync

Apr 22 13:14:25 priyasi systemd[1]: Starting Wait Until Kernel Time Synchronized...
Apr 22 13:14:25 priyasi systemd-time-wait-sync[11090]: adjtime state 5 status 40 time Fri 2022->
13:16 ♒ ॐ ♅ ♄ ⛢      ☹ 😟=> 3  

Dear LazyWeb, anybody knows of why the systemd-time-wait-sync service would hang indefinitely? I’ve had identical setups on many machines, in the same network, where others don’t exhibit this problem.

rrs@priyasi:~$ systemctl cat systemd-time-wait-sync.service

...snipped...

[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-time-wait-sync
TimeoutStartSec=infinity
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target

The TimeoutStartSec=infinity is definitely an attribute that shouldn’t be shipped in any system services. There are use cases for it but that should be left for local admins to explicitly decide. Hanging for infinity is not a desired behavior for a system service.

In figuring all this out, today I learnt the handy systemctl list-jobs command, which will give the list of active running/blocked/waiting jobs.


See also