GNU bug report logs

#68595 VLANs in static-networking-service-type hangs shepherd

PackageSource(s)Maintainer(s)
guix PTS Buildd Popcon
Reply or subscribe to this bug. View this bug as an mbox, status mbox, or maintainer mbox

Report forwarded to bug-guix@gnu.org:
bug#68595; Package guix. (Fri, 19 Jan 2024 21:50:02 GMT) (full text, mbox, link).


Acknowledgement sent to Lars Rustand <rustand.lars@gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org. (Fri, 19 Jan 2024 21:50:02 GMT) (full text, mbox, link).


Message #5 received at submit@debbugs.gnu.org (full text, mbox, reply):

From: Lars Rustand <rustand.lars@gmail.com>
To: bug-guix@gnu.org
Subject: VLANs in static-networking-service-type hangs shepherd
Date: Fri, 19 Jan 2024 20:12:24 +0100
Like the title says, if you add any VLAN in a
static-networking-service-type it seems like the whole shepherd daemon
freezes up and anything that depends on it stops responding.
Additionally the networking does not get fully configured either.

After configuring a VLAN `herd status`, `herd restart networking` and
any other herd command hangs forever with no output. Even reboot is not
working. The only remedy is to restart the system using the power
button, but even after the restart the networking service still fails to
start.

VLANs are seemingly created, but no addresses are created.

Steps to reproduce:

1. Add a static network with a VLAN to your system config (see below for
minimal example)
2. Reconfigure your system
3. Restart the networking service with `sudo herd restart networking`
4. Observe that herd does not finish
5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
6. Observe that none of the commands seem to have any effect, and that
they hang indefinitely with no output

--8<---------------cut here---------------start------------->8---
(service static-networking-service-type
  (list (static-networking
         (links
          (list (network-link
                 (name "myvlan")
                 (type 'vlan)
                 (arguments '((id . 3)
                              (link . "eth0"))))))
         (addresses
          (list (network-address
                 (device "myvlan@eth0")
                 (value "192.168.0.2/24")))))))
--8<---------------cut here---------------end--------------->8---

Alternatively here are the reproduction steps using VM:

1. Build a qcow2 image, make sure there is enough space to reconfigure
   the system. Use --save-provenance so you have the config inside the
   vm so you can reconfigure later.
   `guix system image --image-type=qcow2 --image-size=30G --save-provenance minimal.scm`
2. Copy the qcow image to a writable directory.
3. Start up the vm.
```
sudo qemu-system-x86_64 \
   -nic user,model=virtio-net-pci \
   -enable-kvm -m 2048 \
   -device virtio-blk,drive=myhd \
   -drive
   if=none,file=1a7wi5mgcy3wrsx6pcnag6qjbb87djwl-image.qcow2,id=myhd
```
4. Edit /run/current-system/configuration.scm and uncomment the static
   networking.
5. Reconfigure the system.
6. Try to restart the networking service. `herd restart networking`
7. The command will hang infinitely. Cancel it.
8. Check the network interfaces. The VLAN interface will have been
   created, but it will not have any address.
9. The aforementioned commands will all be unresponsive now.
10. If you reboot your VM you will see that the networking service is
   failed at startup, and if you try to restart the service you will get
   an error: #<&netlink-response-error errno: 17>

--8<---------------cut here---------------start------------->8---
(use-modules
  (gnu)
  (gnu services)
  (gnu services base)
  (gnu services networking)
  (gnu bootloader)
  (gnu bootloader grub)
  (gnu system)
  (gnu system file-systems)
  (gnu system accounts))

(operating-system
  (host-name "minimal")

  (users
    (cons*
      (user-account
        (name "lars")
        (group "users"))
      %base-user-accounts))

  (services
   (cons*
          (service dhcp-client-service-type)
          ;; Commented out so you can uncomment it after booting the VM
          ;;(service static-networking-service-type
          ;;      (list (static-networking
          ;;             (links
          ;;              (list (network-link
          ;;                     (name "myvlan")
          ;;                     (type 'vlan)
          ;;                     (arguments '((id . 3)
          ;;                                  (link . "eth0"))))))
          ;;             (addresses
          ;;              (list (network-address
          ;;                     (device "myvlan@eth0")
          ;;                     (value "192.168.0.2/24")))))))
    %base-services))

   (bootloader
     (bootloader-configuration
       (bootloader grub-bootloader)
       (targets '("/dev/vda"))))

   (file-systems
    (cons*
     %base-file-systems)))
--8<---------------cut here---------------end--------------->8---




Information forwarded to bug-guix@gnu.org:
bug#68595; Package guix. (Fri, 19 Jan 2024 23:46:02 GMT) (full text, mbox, link).


Message #8 received at submit@debbugs.gnu.org (full text, mbox, reply):

From: Lars Rustand <rustand.lars@gmail.com>
To: bug-guix@gnu.org
Subject: Re: VLANs in static-networking-service-type hangs shepherd
Date: Sat, 20 Jan 2024 00:32:58 +0100
For fun I tried to use the exact configuration that is mentioned in the
manual and was amazed that it worked, and the networking service is able
to start successfully. Here is the working configuration:

--8<---------------cut here---------------start------------->8---
(static-networking
 (links (list (network-link
               (name "bond0")
               (type 'bond)
               (arguments '((mode . "802.3ad")
                            (miimon . 100)
                            (lacp-active . "on")
                            (lacp-rate . "fast"))))

              (network-link
               (mac-address "98:11:22:33:44:55")
               (arguments '((master . "bond0"))))

              (network-link
               (mac-address "98:11:22:33:44:56")
               (arguments '((master . "bond0"))))

              (network-link
               (name "bond0.1055")
               (type 'vlan)
               (arguments '((id . 1055)
                            (link . "bond0"))))))
 (addresses (list (network-address
                   (value "192.168.1.4/24")
--8<---------------cut here---------------end--------------->8---


However, if I simply substitute the bond interface with a real interface
I get back the error described in my previous message. This
configuration fails:

--8<---------------cut here---------------start------------->8---
(static-networking
 (links (list (network-link
               (name "bond0.1055")
               (type 'vlan)
               (arguments '((id . 1055)
                            (link . "ens3"))))))
 (addresses (list (network-address
                   (value "192.168.1.4/24")
                   (device "bond0.1055")))))
--8<---------------cut here---------------end--------------->8---


So it seems that VLANs do work for bonds, but not for physical network
interfaces. I've done a lot of digging on the internet and cannot find a
single example of anyone using VLANs at all in Guix, so maybe that is
why this problem hasn't been discovered yet.




Information forwarded to bug-guix@gnu.org:
bug#68595; Package guix. (Mon, 12 Feb 2024 09:57:01 GMT) (full text, mbox, link).


Message #11 received at 68595@debbugs.gnu.org (full text, mbox, reply):

From: Ludovic Courtès <ludo@gnu.org>
To: Lars Rustand <rustand.lars@gmail.com>
Cc: 68595@debbugs.gnu.org, Julien Lepiller <julien@lepiller.eu>, Alexey Abramov <levenson@mmer.org>
Subject: Re: bug#68595: VLANs in static-networking-service-type hangs shepherd
Date: Mon, 12 Feb 2024 10:55:32 +0100
Hi,

Lars Rustand <rustand.lars@gmail.com> skribis:

> Like the title says, if you add any VLAN in a
> static-networking-service-type it seems like the whole shepherd daemon
> freezes up and anything that depends on it stops responding.
> Additionally the networking does not get fully configured either.
>
> After configuring a VLAN `herd status`, `herd restart networking` and
> any other herd command hangs forever with no output. Even reboot is not
> working. The only remedy is to restart the system using the power
> button, but even after the restart the networking service still fails to
> start.

Ouch.  Could you check what /var/log/messages reports?

Once you’ve reproduced the hang, could you attach GDB to shepherd and
get a backtrace?

  gdb -p 1
  bt

(I recommend doing that in a VM rather than on your main machine!)

> 1. Add a static network with a VLAN to your system config (see below for
> minimal example)
> 2. Reconfigure your system
> 3. Restart the networking service with `sudo herd restart networking`
> 4. Observe that herd does not finish
> 5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
> 6. Observe that none of the commands seem to have any effect, and that
> they hang indefinitely with no output
>
> (service static-networking-service-type
>   (list (static-networking
>          (links
>           (list (network-link
>                  (name "myvlan")
>                  (type 'vlan)
>                  (arguments '((id . 3)
>                               (link . "eth0"))))))
>          (addresses
>           (list (network-address
>                  (device "myvlan@eth0")
>                  (value "192.168.0.2/24")))))))

You mentioned in your other message that the example from the manual
works fine.  Could you try and reduce your config until you find which
bit makes it fail?

Cc’ing Alexey and Julien who may know more.

Thanks,
Ludo’.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@gnu.org> to control@debbugs.gnu.org. (Mon, 12 Feb 2024 09:57:02 GMT) (full text, mbox, link).


Information forwarded to bug-guix@gnu.org:
bug#68595; Package guix. (Mon, 12 Feb 2024 12:00:02 GMT) (full text, mbox, link).


Message #16 received at 68595@debbugs.gnu.org (full text, mbox, reply):

From: Alexey Abramov <levenson@mmer.org>
To: Lars Rustand <rustand.lars@gmail.com>
Cc: 68595@debbugs.gnu.org
Subject: Re: bug#68595: VLANs in static-networking-service-type hangs shepherd
Date: Mon, 12 Feb 2024 12:59:26 +0100
Hi Lars,

Lars Rustand <rustand.lars@gmail.com> writes:

> Like the title says, if you add any VLAN in a
> static-networking-service-type it seems like the whole shepherd daemon
> freezes up and anything that depends on it stops responding.
> Additionally the networking does not get fully configured either.
>
> After configuring a VLAN `herd status`, `herd restart networking` and
> any other herd command hangs forever with no output. Even reboot is not
> working. The only remedy is to restart the system using the power
> button, but even after the restart the networking service still fails to
> start.
>
> VLANs are seemingly created, but no addresses are created.
>
> Steps to reproduce:
>
> 1. Add a static network with a VLAN to your system config (see below for
> minimal example)
> 2. Reconfigure your system
> 3. Restart the networking service with `sudo herd restart networking`
> 4. Observe that herd does not finish
> 5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
> 6. Observe that none of the commands seem to have any effect, and that
> they hang indefinitely with no output
>
> --8<---------------cut here---------------start------------->8---
> (service static-networking-service-type
>   (list (static-networking
>          (links
>           (list (network-link
>                  (name "myvlan")
>                  (type 'vlan)
>                  (arguments '((id . 3)
>                               (link . "eth0"))))))
>          (addresses
>           (list (network-address
>                  (device "myvlan@eth0")
>                  (value "192.168.0.2/24")))))))
> --8<---------------cut here---------------end--------------->8---

I see, Could you please, replace the device name to "myvlan" and not
"myvlan@eth0" in the network-address.

Even though ip link (iproute2) shows you 'myvlan@eth0' this is not an
actual name of the interfaces.

> Alternatively here are the reproduction steps using VM:
>
> 1. Build a qcow2 image, make sure there is enough space to reconfigure
>    the system. Use --save-provenance so you have the config inside the
>    vm so you can reconfigure later.
>    `guix system image --image-type=qcow2 --image-size=30G --save-provenance minimal.scm`
> 2. Copy the qcow image to a writable directory.
> 3. Start up the vm.
> ```
> sudo qemu-system-x86_64 \
>    -nic user,model=virtio-net-pci \
>    -enable-kvm -m 2048 \
>    -device virtio-blk,drive=myhd \
>    -drive
>    if=none,file=1a7wi5mgcy3wrsx6pcnag6qjbb87djwl-image.qcow2,id=myhd
> ```
> 4. Edit /run/current-system/configuration.scm and uncomment the static
>    networking.
> 5. Reconfigure the system.
> 6. Try to restart the networking service. `herd restart networking`
> 7. The command will hang infinitely. Cancel it.
> 8. Check the network interfaces. The VLAN interface will have been
>    created, but it will not have any address.
> 9. The aforementioned commands will all be unresponsive now.
> 10. If you reboot your VM you will see that the networking service is
>    failed at startup, and if you try to restart the service you will get
>    an error: #<&netlink-response-error errno: 17>
>

We need to improve our error messaging. This means that the
interface is exist. 

-- 
Alexey




Information forwarded to bug-guix@gnu.org:
bug#68595; Package guix. (Thu, 15 Feb 2024 09:25:02 GMT) (full text, mbox, link).


Message #19 received at 68595@debbugs.gnu.org (full text, mbox, reply):

From: Lars Rustand <rustand.lars@gmail.com>
To: Ludovic Courtès <ludo@gnu.org>
Cc: 68595@debbugs.gnu.org, Julien Lepiller <julien@lepiller.eu>, Alexey Abramov <levenson@mmer.org>
Subject: Re: bug#68595: VLANs in static-networking-service-type hangs shepherd
Date: Thu, 15 Feb 2024 10:07:51 +0100
Ludovic Courtès <ludo@gnu.org> writes:

> Ouch.  Could you check what /var/log/messages reports?
>
> Once you’ve reproduced the hang, could you attach GDB to shepherd and
> get a backtrace?
>
>   gdb -p 1
>   bt
>
> (I recommend doing that in a VM rather than on your main machine!)
>

I have unfortunately been unable to reproduce the full shepherd hang,
even though I have followed the exact same procedure as before. I still
experience that the command `herd restart networking` hangs indefinitely
the first time after adding a VLAN, but now this has not triggered the
whole shepherd to hang afterwards anymore.

The basic error 17 still comes any time I try to start networking
service while having a VLAN configured.


>
> You mentioned in your other message that the example from the manual
> works fine.  Could you try and reduce your config until you find which
> bit makes it fail?

The configuration I have already attached is as minimal as it is
possible. It only includes the mandatory OS fields and a minimal
static-networking-configuration.

I have already found which bit makes it fail. It is the use of VLAN for
any normal network link. VLANs seem to only work for bond devices as in
the example.

The reproduction steps are maybe a little over-complicated however, and
are only necessary in order to reproduce the full "shepherd hangs" bug,
which I now am unable to reproduce anyway. But what I believe is the
root of the problem is the error 17 on starting the networking
service. This can be reproduced much more simply and reliably by just
starting a VM the normal way with the static-networking snippet already
enabled when building it.

So here are the new simplified reproduction steps for reproducing only
the error 17 and unfunctional VLAN networking:

Use the OS config from my first post, but uncomment the static
networking block. Build and run the VM with `$(guix system vm minimal.scm)`.

That's it.

> Cc’ing Alexey and Julien who may know more.
>
> Thanks,
> Ludo’.



Alexey Abramov <levenson@mmer.org> writes:

> Hi Lars,
>
> I see, Could you please, replace the device name to "myvlan" and not
> "myvlan@eth0" in the network-address.
>
> Even though ip link (iproute2) shows you 'myvlan@eth0' this is not an
> actual name of the interfaces.
>

I have tried with your suggestion, but everything behaves exactly the same.




Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Sun Sep 8 03:13:31 2024; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.