Cross Architecture Linux Containers

Linux and ARM

With more ARM based devices in the market, and with them getting more powerful every day, it is more common to see more of ARM images for your favorite Linux distribution. Of them, Debian has become the default choice for individuals and companies to base their work on. It must have to do with Debian’s long history of trying to support many more architectures than the rest of the distributions. Debian also tends to have a much wider user/developer mindshare even though it does not have a direct backing from any of the big Linux distribution companies.

Some of my work involves doing packaging and integration work which reflects on all architectures and image types; ARM included. So having the respective environment readily available is really important to get work done quicker.

I still recollect back in 2004, when I was much newer to Linux Enterprise while working at a big Computer Hardware Company, I had heard about the Itanium 64 architecture. Back then, trying out anything other than x86 would mean you need access to physical hardware. Or be a DD and have shell access the Debian Machines.

With Linux Virtualization, a lot seems to have improved over time.

Linux Virtualization

With a decently powered processor with virtualization support, you can emulate a lot of architectures that Linux supports.

Linux has many virtualization options but the main dominant ones are KVM/Qemu, Xen and VirtualBox. Qemu is the most feature-rich virtualization choice on Linux with a wide range of architectures that it can emulate.

In case of ARM, things are still a bit tricky as hardware definition is tightly coupled. Emulating device type on a virtualized Qemu is not straightforward as x86 architecture. Thankfully, in libvirt, you can be provided with a generic machine type called virt. For basic cross architecture tasks, this should be enough.

virt board profile under libvirt

virt board profile under libvirt

But, while, virtualization is a nice progression, it is not always an optimal one. For one, it needs good device virtualization support, which can be tricky in the ARM context. Second, it can be (very) slow at times.

And unless you are doing low-level hardware specific work, you can look for an alternative in Linux Containers

Linux Containers

So this is nothing new now. Lots and lots of buzz around containers already. There’s many different implementations across platforms supporting similar concept. From good old chroot (with limited functionality), jails (on BSD), to well marketed products like: Docker, LXC, systemd-nspawn. There’s also some like firejail targeting specific use cases.

As long as you do not have tight dependency on the hardware or a dependency on the specific parts of the Linux kernel (like once I explored the possibility of running open-iscsi in a containerized environment instead), containers are a quick way to get an equal environment. Especially, things like Process, Namespace and Network separation are awesome helping me concentrate on the work rather than putting the focus on Host <=> Guest issues.

Given how fast work can be accomplished with containers, I have been wanting to explore the possibility of building container images for ARM and other architectures that I care about.

The good thing is that architecture virtualization is offered through Qemu and the same tool also provides similar architecture emulation. So features and fixes have a higher chance of parity as they are being served from the same tool.

systemd-nspawn

I haven’t explored all the container implementations that Linux has to offer. Initially, I used LXC for a while. But these days, for my work it is docker, while my personal preference lies with systemd-nspawn.

Part of the reason is simply because I have grown more familiar with systemd given it is the house keeper for my operating system now. And also, so far, I like most of what systemd offers.

Getting Cross Architecture Containers under Debian GNU/Linux

  • Use qemu-user-static for emulation
  • Generate cross architecture chroot images with qemu-debootstrap
  • Import those as sub-volumes under systemd-nspawn
  • Set a template for your containers and other misc stuff

Glitches

Not everything works perfect but most of it does work.

Here’s my container list. Subvolumed containers do help de-duplicate and save some space. They are also very very quick when creating a clone for the container

rrs@priyasi:~$ machinectl list-images
NAME                 TYPE      RO USAGE CREATED                     MODIFIED
2019                 subvolume no   n/a Mon 2019-06-10 09:18:26 IST n/a     
SDK1812              subvolume no   n/a Mon 2018-10-15 12:45:39 IST n/a     
BusterArm64          subvolume no   n/a Mon 2019-06-03 14:49:40 IST n/a     
DebSidArm64          subvolume no   n/a Mon 2019-06-03 14:56:42 IST n/a     
DebSidArmhf          subvolume no   n/a Mon 2018-07-23 21:18:42 IST n/a     
DebSidMips           subvolume no   n/a Sat 2019-06-01 08:31:34 IST n/a     
DebianJessieTemplate subvolume no   n/a Mon 2018-07-23 21:18:54 IST n/a     
DebianSidTemplate    subvolume no   n/a Mon 2018-07-23 21:18:05 IST n/a     
aptVerifySigsDebSid  subvolume no   n/a Mon 2018-07-23 21:19:04 IST n/a     
jenkins-builder      subvolume no   n/a Tue 2018-11-27 20:11:42 IST n/a     
jenkins-builder-new  subvolume no   n/a Tue 2019-04-16 10:13:43 IST n/a     
opensuse             subvolume no   n/a Mon 2018-07-23 21:18:34 IST n/a     

12 images listed.
21:08 ♒♒♒   ☺ 😄    

Problems with networking on foreign architectures

So this must mostly have to do with Qemu’s emulation. When making networking work, I did see many reports and fixes upstream about networking and other subsystems having issues with emulation. Luckily for the architectures listed above, I have been able to make use of them with some workarounds.

Here’s a running Debian Sid ARM64 container image under systemd-nsapwn with qemu emulation

DebSidArm64(fba63648a4c4451ebf56eb758463b37d)
           Since: Fri 2019-07-19 21:13:51 IST; 1min 16s ago
          Leader: 12061 (systemd)
         Service: systemd-nspawn; class container
            Root: /var/lib/machines/DebSidArm64
           Iface: sysbr0
              OS: Debian GNU/Linux 10 (buster)
       UID Shift: 445841408
            Unit: systemd-nspawn@DebSidArm64.service
                  ├─payload
                  │ ├─init.scope
                  │ │ └─12061 /usr/bin/qemu-aarch64-static /usr/lib/systemd/systemd
                  │ └─system.slice
                  │   ├─console-getty.service
                  │   │ └─12245 /usr/bin/qemu-aarch64-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
                  │   ├─cron.service
                  │   │ └─12200 /usr/bin/qemu-aarch64-static /usr/sbin/cron -f
                  │   ├─dbus.service
                  │   │ └─12197 /usr/bin/qemu-aarch64-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
                  │   ├─rsyslog.service
                  │   │ └─12202 /usr/bin/qemu-aarch64-static /usr/sbin/rsyslogd -n -iNONE
                  │   ├─systemd-journald.service
                  │   │ └─12123 /usr/bin/qemu-aarch64-static /lib/systemd/systemd-journald
                  │   └─systemd-logind.service
                  │     └─12203 /usr/bin/qemu-aarch64-static /lib/systemd/systemd-logind
                  └─supervisor
                    └─12059 /usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-bridge=sysbr0 --bind /var/tmp/Debian-Build/containers/ --bind-ro /var/tmp/:/var/tmp/vartmp -U --settings=override --machine=DebSidArm64

Jul 19 21:13:53 priyasi systemd-nspawn[12059]: [  OK  ] Reached target Multi-User System.
Jul 19 21:13:53 priyasi systemd-nspawn[12059]: [  OK  ] Reached target Graphical Interface.
Jul 19 21:13:53 priyasi systemd-nspawn[12059]:          Starting Update UTMP about System Runlevel Changes...
Jul 19 21:13:53 priyasi systemd-nspawn[12059]: [  OK  ] Started Update UTMP about System Runlevel Changes.
Jul 19 21:13:53 priyasi systemd-nspawn[12059]: [  OK  ] Started Rotate log files.
Jul 19 21:13:57 priyasi systemd-nspawn[12059]: [  OK  ] Started Daily apt download activities.
Jul 19 21:13:57 priyasi systemd-nspawn[12059]:          Starting Daily apt upgrade and clean activities...
Jul 19 21:13:59 priyasi systemd-nspawn[12059]: [2B blob data]
Jul 19 21:13:59 priyasi systemd-nspawn[12059]: Debian GNU/Linux 10 SidArm64 console
Jul 19 21:13:59 priyasi systemd-nspawn[12059]: [1B blob data]
21:15 ♒♒♒   ☺ 😄    

And the login

rrs@priyasi:~$ machinectl login DebSidArm64 
Connected to machine DebSidArm64. Press ^] three times within 1s to exit session.

Debian GNU/Linux 10 SidArm64 pts/0

SidArm64 login: root
Password: 
Last login: Thu Jun 20 10:57:30 IST 2019 on pts/0
Linux SidArm64 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@SidArm64:~# 
root@SidArm64:~# 
root@SidArm64:~# uname -a
Linux SidArm64 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) aarch64 GNU/Linux
root@SidArm64:~# 

The problematic networking part. Notice the qemu emulation error messages. Basically, at this stage networking is dead.

root@SidArm64:~# ip a
Unsupported setsockopt level=270 optname=11
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: host0@if10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
request send failed: Operation not supported
    link/ether f6:9d:cc:d6:ad:32 brd ff:ff:ff:ff:ff:ffroot@SidArm64:~# 
root@SidArm64:~# 


root@SidArm64:~# ping www.google.com
ping: www.google.com: Temporary failure in name resolution
root@SidArm64:~# 

root@SidArm64:~# cat /etc/resolv.conf 
nameserver 172.16.20.1

Because I prefer systemd for containers, I chose to make use of systemd’s network management tools too. Maybe that is causing the problem but in my opinion, is highly unlikely to be the cause.

Anyways….simply invoking dhclient at this stage does assign the container an ip address. But more than that the core networking stack gets back in action, even though the reporting tools still report errors.

root@SidArm64:~# dhclient host0
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unsupported setsockopt level=270 optname=11
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unsupported setsockopt level=263 optname=8
Unsupported setsockopt level=270 optname=11
Unknown target IFA type: 4
Unknown target IFA type: 3
Unknown target IFA type: 6
root@SidArm64:~# 


root@SidArm64:~# ip a
Unsupported setsockopt level=270 optname=11
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
Unknown host QEMU_IFLA type: 50
Unknown host QEMU_IFLA type: 51
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: host0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
request send failed: Operation not supported
    link/ether f6:9d:cc:d6:ad:32 brd ff:ff:ff:ff:ff:ffroot@SidArm64:~# 
root@SidArm64:~# 


root@SidArm64:~# ping 172.16.20.1
PING 172.16.20.1 (172.16.20.1) 56(84) bytes of data.
64 bytes from 172.16.20.1: icmp_seq=1 ttl=64 time=0.091 ms
64 bytes from 172.16.20.1: icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from 172.16.20.1: icmp_seq=3 ttl=64 time=0.138 ms
64 bytes from 172.16.20.1: icmp_seq=4 ttl=64 time=0.136 ms
64 bytes from 172.16.20.1: icmp_seq=5 ttl=64 time=0.152 ms
64 bytes from 172.16.20.1: icmp_seq=6 ttl=64 time=0.143 ms
64 bytes from 172.16.20.1: icmp_seq=7 ttl=64 time=0.137 ms
64 bytes from 172.16.20.1: icmp_seq=8 ttl=64 time=0.144 ms
^C
--- 172.16.20.1 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 158ms
rtt min/avg/max/mdev = 0.061/0.125/0.152/0.030 ms


root@SidArm64:~# ping www.google.com
PING www.google.com (216.58.197.68) 56(84) bytes of data.
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=1 ttl=50 time=45.2 ms
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=2 ttl=50 time=63.7 ms
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=3 ttl=50 time=63.2 ms
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=4 ttl=50 time=80.7 ms
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=5 ttl=50 time=65.4 ms
64 bytes from maa03s21-in-f4.1e100.net (216.58.197.68): icmp_seq=6 ttl=50 time=70.4 ms
^C
--- www.google.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 12ms
rtt min/avg/max/mdev = 45.190/64.758/80.717/10.597 ms

Conclusion

With some minor annoyances but otherwise this has been my fastest way to get work done, especially cross architecture stuff. When I compare it back to how I’d have thought such a use case back in the early days, I can’t imagine the speed and simplicity I have at hand today. Free and Opensource Software has been a great choice, while it started just as a curiosity of young boy.


See also