Aether-in-a-Box FAQs and Troubleshooting

FAQs

RKE2 vs. Kubespray Install

The AiaB installer will bring up Kubernetes on the server where it is run. By default it uses RKE2 as the Kubernetes platform. However, older versions of AiaB used Kubespray and that is still an option. To switch to Kubespray as the Kubernetes platform, edit the Makefile and replace rke2 with kubespray on this line:

node0:~/aether-in-a-box$ git diff Makefile
diff --git a/Makefile b/Makefile
index 5f2c186..608c221 100644
--- a/Makefile
+++ b/Makefile
@@ -35,7 +35,7 @@ ENABLE_GNBSIM ?= true
 ENABLE_SUBSCRIBER_PROXY ?= false
 GNBSIM_COLORS ?= true

-K8S_INSTALL ?= rke2
+K8S_INSTALL ?= kubespray
 CTR_CMD     := sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io

 PROXY_ENABLED   ?= false
node0:~/aether-in-a-box$

You may wish to use Kubespray instead of RKE2 if you want to use locally-built images with AiaB (e.g., if you are developing SD-CORE services). The reason is that RKE2 uses containerd instead of Docker and so cannot access images in the local Docker registry.

How to use Local Image

Note that RKE2 (the default Kubernetes installer) is based on containerd rather than Docker. Containerd has its own local image registry that is separate from the local Docker Registry. With RKE2, if you have used docker build to build a local image, it is only in the Docker registry and so is not available to run in AiaB without some additional steps. An easy workaround is to use docker push to push the image to a remote repository (e.g., Docker Hub) and then modify your Helm values file to pull in that remote image. Another option is to save the local Docker image into a file and push the file to the containerd registry like this:

docker save -o /tmp/lte-uesoftmodem.tar omecproject/lte-uesoftmodem:1.1.0
sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io \
    images import /tmp/lte-uesoftmodem.tar

The above commands save the local Docker image omecproject/lte-uesoftmodem:1.1.0 in a tarball, and then upload the tarball into the containerd registry where it is available for use by RKE2. Of course you should replace omecproject/lte-uesoftmodem:1.1.0 with the name of your image.

If you know that you are going to be using AiaB to test locally-built images, probably the easiest thing to do is to use the Kubespray installer. If you have already installed using RKE2 and you want to switch to Kubespray, first run make clean before following the steps in the Getting Help section above.

Restarting the AiaB Server

AiaB should come up in a mostly working state if the AiaB server is rebooted. If any pods are stuck in an Error or CrashLoopBackoff state they can be restarted using kubectl delete pod. It might also be necessary to power cycle the Sercomm eNodeB in order to get it to reconnect to the SD-CORE.

Enabling externalIP at MME

You can enable externalIP service in the MME by providing following config in the override file:

node0:~/aether-in-a-box$ git diff sd-core-4g-values.yaml
diff --git a/sd-core-4g-values.yaml b/sd-core-4g-values.yaml
index 0939739..f240f89 100644
--- a/sd-core-4g-values.yaml
+++ b/sd-core-4g-values.yaml
@@ -24,6 +24,11 @@ omec-control-plane:
       bootstrap:
         users: []
         staticusers: []
+    mme:
+      s1ap:
+        serviceType: ClusterIP
+        externalIP: 10.1.1.1
+
     spgwc:
       pfcp: true
       cfgFiles:
node0:~/aether-in-a-box$

Enabling externalIP at AMF

You can enable externalIP service in the AMF by providing following config in the override file:

node0:~/aether-in-a-box$ git diff sd-core-5g-values.yaml
diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml
index e513e1f..fc1c684 100644
--- a/sd-core-5g-values.yaml
+++ b/sd-core-5g-values.yaml
@@ -34,6 +34,9 @@

     amf:
       cfgFiles:
+        ngapp:
+          serviceType: ClusterIP
+          externalIp: "10.1.1.2"
+          port: 38412
         amfcfg.conf:
           configuration:
             enableDBStore: false
@@ -176,6 +179,7 @@ omec-user-plane:
           cpiface:
             dnn: "internet"
             hostname: "upf"

 5g-ran-sim:
   enable: ${ENABLE_GNBSIM}

node0:~/aether-in-a-box$

Troubleshooting

NOTE: Running both 4G and 5G SD-CORE simultaneously in AiaB is currently not supported.

Proxy Issues

When working with AiaB behind a proxy, it may be possible to experience certain issues due to security policies. That is, the proxy may block a domain (e.g., opencord.org) and you may see messages like these ones when trying to clone or get a copy of aether-in-a-box:

ubuntu18:~$ git clone https://gerrit.opencord.org/aether-in-a-box
Cloning into 'aether-in-a-box'...
fatal: unable to access 'https://gerrit.opencord.org/aether-in-a-box/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

or:

ubuntu18:~$ wget https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
--2022-06-01 13:13:42--  https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
Resolving proxy.company-xyz.com (proxy.company-xyz.com)... w.x.y.z
Connecting to proxy.company-xyz.com (proxy.company-xyz.com)|w.x.y.z|:#... connected.
ERROR: cannot verify gerrit.opencord.org's certificate, issued by 'emailAddress=proxy-team@company-xyz.com,... ,C=US':
 Self-signed certificate encountered.

To address this issue, you need to talk to your company’s proxy admins and request to unblock (re-classify) the opencord.org domain

“make” fails immediately

AiaB connects macvlan networks to DATA_IFACE so that the UPF can communicate on the network. To do this it assumes that the systemd-networkd service is installed and running, DATA_IFACE is under its control, and the systemd-networkd configuration file for DATA_IFACE ends with <DATA_IFACE>.network, where <DATA_IFACE> stands for the actual interface name. It tries to find this configuration file by looking in the standard paths. If it fails you’ll see a message like:

FATAL: Could not find systemd-networkd config for interface foobar, exiting now!
make: *** [Makefile:112: /users/acb/aether-in-a-box//build/milestones/interface-check] Error 1

In this case, you can specify a DATA_IFACE_PATH=<path to the config file> argument to make so that AiaB can find the systemd-networkd configuration file for DATA_IFACE. It’s also possible that your system does not use systemd-networkd to configure network interfaces (more likely if you are running in a VM), in which case AiaB is currently not able to install in your setup. You can check that systemd-networkd is installed and running as follows:

$ systemctl status systemd-networkd.service
● systemd-networkd.service - Network Service
    Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
    Active: active (running) since Tue 2022-07-12 13:42:18 CDT; 2h 26min ago
TriggeredBy: ● systemd-networkd.socket
    Docs: man:systemd-networkd.service(8)
Main PID: 13777 (systemd-network)
    Status: "Processing requests..."
    Tasks: 1 (limit: 193212)
    Memory: 6.4M
    CGroup: /system.slice/systemd-networkd.service
            └─13777 /lib/systemd/systemd-networkd

AiaB fails during deployment of SD-Core network

When running AiaB in Ubuntu 22.04, AiaB installation fails during the deployment of the SD-Core with an error message as shown below:

...
...
Update Complete. ⎈Happy Helming!⎈
NODE_IP=10.80.51.4 DATA_IFACE=data RAN_SUBNET=192.168.251.0/24 ENABLE_GNBSIM=true envsubst < /home/ubuntu/aether-in-a-box//sd-core-5g-values.yaml | \
helm upgrade --create-namespace --install --wait  \
        --namespace omec \
        --values - \
        sd-core \
        aether/sd-core
Release "sd-core" does not exist. Installing it now.
coalesce.go:175: warning: skipped value for kafka.config: Not a table.
Error: timed out waiting for the condition
make: *** [Makefile:336: /home/ubuntu/aether-in-a-box//build/milestones/5g-core] Error 1

To get more details about the issue, you can execute the following command to see what pod(s) have issues:

$ kubectl -n omec get pods
NAME                          READY   STATUS             RESTARTS         AGE
amf-6dd746b9cd-2mk2j          0/1     CrashLoopBackOff   13 (24s ago)     42m
ausf-6dbb7655c7-4pkmp         1/1     Running            0                42m
gnbsim-0                      1/1     Running            0                42m
metricfunc-7864fb8b7c-srf2l   1/1     Running            3 (41m ago)      42m
mongodb-0                     1/1     Running            0                42m
mongodb-1                     1/1     Running            0                41m
mongodb-arbiter-0             1/1     Running            0                42m
nrf-57c79d9f65-fs9qj          1/1     Running            0                42m
nssf-5b85b8978d-q8dz5         1/1     Running            0                42m
pcf-758d7cfb48-wjfxf          1/1     Running            0                42m
sd-core-kafka-0               1/1     Running            0                42m
sd-core-zookeeper-0           1/1     Running            0                42m
simapp-6cccd6f787-sd52q       0/1     Error              13 (5m14s ago)   42m
smf-ff667d5b8-sw5vf           1/1     Running            0                42m
udm-768b9987b4-cqvbg          1/1     Running            0                42m
udr-8566897d45-n8cbz          1/1     Running            0                42m
upf-0                         5/5     Running            0                42m
webui-5894ffd49d-bdwf4        1/1     Running            0                42m

As shown above, there are problems with the AMF and SIMAPP pods and to see the specifics of the problem, the user can see the logs as shown below:

$ kubectl -n omec logs amf-6dd746b9cd-2mk2j
...
...
} (resolver returned new addresses)
2023/01/24 17:24:56 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2023/01/24 17:24:56 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2023/01/24 17:24:56 too many open files

As the message shows, the problem is due to “too many open files”. To resolve this issue, the user can increase the maximum number of available watches and the maximum number of inotify instances (e.g., 10x). To do so, first, see the current maximum numbers:

$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128
$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 1048576

Then, increase these values by executing:

sudo sysctl fs.inotify.max_user_instances=1280
sudo sysctl fs.inotify.max_user_watches=10485760

The above setting gets reset to their original values when the machine is rebooted. You can make this change permanent by creating an override file:

sudo nano /etc/sysctl.d/90-override.conf
fs.inotify.max_user_instances=1280
fs.inotify.max_user_watches=10485760

Data plane is not working

The first step is to read Understanding AiaB networking`_understanding_aiab_networking, which gives a high level picture of the AiaB data plane and how the pieces fit together. In order to debug the problem you will need to figure out where data plane packets from the eNodeB are dropped. One way to do this is to run ``tcpdump` on (1) DATA_IFACE to ensure that the data plane packets are arriving, (2) the access interface to see that they make it to the UPF, and (3) the core to check that they are forwarded upstream.

If the upstream packets don’t make it to DATA_IFACE, you probably need to add the static route on the eNodeB so packets to the UPF have a next hop of DATA_IFACE. You can see these upstream packets by running:

tcpdump -i <data-iface> -n udp port 2152

If they don’t make it to access you should check that the kernel routing table is forwarding a packet with destination 192.158.252.3 to the access interface. You can see them by running:

tcpdump -i access -n udp port 2152

In case packets are not forwarded from DATA_IFACE to acccess interface, the following command can be used to forward the traffic which is destined to 192.168.252.3:

iptables -A FORWARD -d 192.168.252.3 -i <data-iface> -o access -j ACCEPT

If they don’t make it to core then they are being dropped by the UPF for some reason. This may be a configuration issue with the state loaded in the ROC / SD-CORE – the UPF is being told to discard these packets. You should check that the device’s IMSI is part of a slice and that the slice’s policy settings allow traffic to that destination. You can view them via the following:

tcpdump -i core -n net 172.250.0.0/16

That command will capture all packets to/from the UE subnet.

If you cannot figure out the issue, see Getting Help.

Getting Help

Please introduce yourself and post your questions to the #aether-dev channel on the ONF Community Slack. Details about how to join this channel can be found on the ONF Wiki. In your introduction please state your institution and position, and describe why you are interested in Aether and what is your end goal.

If you need help debugging your setup, please give as much detail as possible about your environment: the OS version you have installed, are you running on bare metal or in a VM, how much CPU and memory does your server have, are you installing behind a proxy, and so on. Also list the steps you have performed so far, and post any error messages you have received. These details will aid the community to understand where you are and how to help you make progress.