Aether-in-a-Box FAQs and Troubleshooting
FAQs
RKE2 vs. Kubespray Install
The AiaB installer will bring up Kubernetes on the server where it is run. By default it uses RKE2 as the Kubernetes platform. However, older versions of AiaB used Kubespray and that is still an option. To switch to Kubespray as the Kubernetes platform, edit the Makefile and replace rke2 with kubespray on this line:
node0:~/aether-in-a-box$ git diff Makefile
diff --git a/Makefile b/Makefile
index 5f2c186..608c221 100644
--- a/Makefile
+++ b/Makefile
@@ -35,7 +35,7 @@ ENABLE_GNBSIM ?= true
ENABLE_SUBSCRIBER_PROXY ?= false
GNBSIM_COLORS ?= true
-K8S_INSTALL ?= rke2
+K8S_INSTALL ?= kubespray
CTR_CMD := sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io
PROXY_ENABLED ?= false
node0:~/aether-in-a-box$
You may wish to use Kubespray instead of RKE2 if you want to use locally-built images with AiaB (e.g., if you are developing SD-CORE services). The reason is that RKE2 uses containerd instead of Docker and so cannot access images in the local Docker registry.
How to use Local Image
Note that RKE2 (the default Kubernetes installer) is based on containerd rather than Docker. Containerd has its own local image registry that is separate from the local Docker Registry. With RKE2, if you have used docker build to build a local image, it is only in the Docker registry and so is not available to run in AiaB without some additional steps. An easy workaround is to use docker push to push the image to a remote repository (e.g., Docker Hub) and then modify your Helm values file to pull in that remote image. Another option is to save the local Docker image into a file and push the file to the containerd registry like this:
docker save -o /tmp/lte-uesoftmodem.tar omecproject/lte-uesoftmodem:1.1.0
sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io \
images import /tmp/lte-uesoftmodem.tar
The above commands save the local Docker image omecproject/lte-uesoftmodem:1.1.0 in a tarball, and then upload the tarball into the containerd registry where it is available for use by RKE2. Of course you should replace omecproject/lte-uesoftmodem:1.1.0 with the name of your image.
If you know that you are going to be using AiaB to test locally-built images, probably the easiest thing to do is to use the Kubespray installer. If you have already installed using RKE2 and you want to switch to Kubespray, first run make clean before following the steps in the Getting Help section above.
Restarting the AiaB Server
AiaB should come up in a mostly working state if the AiaB server is rebooted. If any pods are
stuck in an Error or CrashLoopBackoff state they can be restarted using kubectl delete pod
.
It might also be necessary to power cycle the Sercomm eNodeB in order to get it to reconnect to
the SD-CORE.
Enabling externalIP at MME
You can enable externalIP service in the MME by providing following config in the override file:
node0:~/aether-in-a-box$ git diff sd-core-4g-values.yaml
diff --git a/sd-core-4g-values.yaml b/sd-core-4g-values.yaml
index 0939739..f240f89 100644
--- a/sd-core-4g-values.yaml
+++ b/sd-core-4g-values.yaml
@@ -24,6 +24,11 @@ omec-control-plane:
bootstrap:
users: []
staticusers: []
+ mme:
+ s1ap:
+ serviceType: ClusterIP
+ externalIP: 10.1.1.1
+
spgwc:
pfcp: true
cfgFiles:
node0:~/aether-in-a-box$
Enabling externalIP at AMF
You can enable externalIP service in the AMF by providing following config in the override file:
node0:~/aether-in-a-box$ git diff sd-core-5g-values.yaml
diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml
index e513e1f..fc1c684 100644
--- a/sd-core-5g-values.yaml
+++ b/sd-core-5g-values.yaml
@@ -34,6 +34,9 @@
amf:
cfgFiles:
+ ngapp:
+ serviceType: ClusterIP
+ externalIp: "10.1.1.2"
+ port: 38412
amfcfg.conf:
configuration:
enableDBStore: false
@@ -176,6 +179,7 @@ omec-user-plane:
cpiface:
dnn: "internet"
hostname: "upf"
5g-ran-sim:
enable: ${ENABLE_GNBSIM}
node0:~/aether-in-a-box$
Troubleshooting
NOTE: Running both 4G and 5G SD-CORE simultaneously in AiaB is currently not supported.
Proxy Issues
When working with AiaB behind a proxy, it may be possible to experience certain issues due to security policies. That is, the proxy may block a domain (e.g., opencord.org) and you may see messages like these ones when trying to clone or get a copy of aether-in-a-box:
ubuntu18:~$ git clone https://gerrit.opencord.org/aether-in-a-box
Cloning into 'aether-in-a-box'...
fatal: unable to access 'https://gerrit.opencord.org/aether-in-a-box/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
or:
ubuntu18:~$ wget https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
--2022-06-01 13:13:42-- https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
Resolving proxy.company-xyz.com (proxy.company-xyz.com)... w.x.y.z
Connecting to proxy.company-xyz.com (proxy.company-xyz.com)|w.x.y.z|:#... connected.
ERROR: cannot verify gerrit.opencord.org's certificate, issued by 'emailAddress=proxy-team@company-xyz.com,... ,C=US':
Self-signed certificate encountered.
To address this issue, you need to talk to your company’s proxy admins and request to unblock (re-classify) the opencord.org domain
“make” fails immediately
AiaB connects macvlan networks to DATA_IFACE
so that the UPF can communicate on the network.
To do this it assumes that the systemd-networkd service is installed and running, DATA_IFACE
is under its control, and the systemd-networkd configuration file for DATA_IFACE
ends with
<DATA_IFACE>.network
, where <DATA_IFACE>
stands for the actual interface name. It
tries to find this configuration file by looking in the standard paths. If it fails you’ll see
a message like:
FATAL: Could not find systemd-networkd config for interface foobar, exiting now!
make: *** [Makefile:112: /users/acb/aether-in-a-box//build/milestones/interface-check] Error 1
In this case, you can specify a DATA_IFACE_PATH=<path to the config file>
argument to make
so that AiaB can find the systemd-networkd configuration file for DATA_IFACE
. It’s also possible
that your system does not use systemd-networkd to configure network interfaces (more likely if you
are running in a VM), in which case AiaB is currently not able to install in your setup. You
can check that systemd-networkd is installed and running as follows:
$ systemctl status systemd-networkd.service
● systemd-networkd.service - Network Service
Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2022-07-12 13:42:18 CDT; 2h 26min ago
TriggeredBy: ● systemd-networkd.socket
Docs: man:systemd-networkd.service(8)
Main PID: 13777 (systemd-network)
Status: "Processing requests..."
Tasks: 1 (limit: 193212)
Memory: 6.4M
CGroup: /system.slice/systemd-networkd.service
└─13777 /lib/systemd/systemd-networkd
AiaB fails during deployment of SD-Core network
When running AiaB in Ubuntu 22.04, AiaB installation fails during the deployment of the SD-Core with an error message as shown below:
...
...
Update Complete. ⎈Happy Helming!⎈
NODE_IP=10.80.51.4 DATA_IFACE=data RAN_SUBNET=192.168.251.0/24 ENABLE_GNBSIM=true envsubst < /home/ubuntu/aether-in-a-box//sd-core-5g-values.yaml | \
helm upgrade --create-namespace --install --wait \
--namespace omec \
--values - \
sd-core \
aether/sd-core
Release "sd-core" does not exist. Installing it now.
coalesce.go:175: warning: skipped value for kafka.config: Not a table.
Error: timed out waiting for the condition
make: *** [Makefile:336: /home/ubuntu/aether-in-a-box//build/milestones/5g-core] Error 1
To get more details about the issue, you can execute the following command to see what pod(s) have issues:
$ kubectl -n omec get pods
NAME READY STATUS RESTARTS AGE
amf-6dd746b9cd-2mk2j 0/1 CrashLoopBackOff 13 (24s ago) 42m
ausf-6dbb7655c7-4pkmp 1/1 Running 0 42m
gnbsim-0 1/1 Running 0 42m
metricfunc-7864fb8b7c-srf2l 1/1 Running 3 (41m ago) 42m
mongodb-0 1/1 Running 0 42m
mongodb-1 1/1 Running 0 41m
mongodb-arbiter-0 1/1 Running 0 42m
nrf-57c79d9f65-fs9qj 1/1 Running 0 42m
nssf-5b85b8978d-q8dz5 1/1 Running 0 42m
pcf-758d7cfb48-wjfxf 1/1 Running 0 42m
sd-core-kafka-0 1/1 Running 0 42m
sd-core-zookeeper-0 1/1 Running 0 42m
simapp-6cccd6f787-sd52q 0/1 Error 13 (5m14s ago) 42m
smf-ff667d5b8-sw5vf 1/1 Running 0 42m
udm-768b9987b4-cqvbg 1/1 Running 0 42m
udr-8566897d45-n8cbz 1/1 Running 0 42m
upf-0 5/5 Running 0 42m
webui-5894ffd49d-bdwf4 1/1 Running 0 42m
As shown above, there are problems with the AMF and SIMAPP pods and to see the specifics of the problem, the user can see the logs as shown below:
$ kubectl -n omec logs amf-6dd746b9cd-2mk2j
...
...
} (resolver returned new addresses)
2023/01/24 17:24:56 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2023/01/24 17:24:56 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2023/01/24 17:24:56 too many open files
As the message shows, the problem is due to “too many open files”. To resolve this issue, the user can increase the maximum number of available watches and the maximum number of inotify instances (e.g., 10x). To do so, first, see the current maximum numbers:
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128
$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 1048576
Then, increase these values by executing:
sudo sysctl fs.inotify.max_user_instances=1280
sudo sysctl fs.inotify.max_user_watches=10485760
The above setting gets reset to their original values when the machine is rebooted. You can make this change permanent by creating an override file:
sudo nano /etc/sysctl.d/90-override.conf
fs.inotify.max_user_instances=1280
fs.inotify.max_user_watches=10485760
Data plane is not working
The first step is to read Understanding AiaB networking`_understanding_aiab_networking, which
gives a high level picture
of the AiaB data plane and how the pieces fit together. In order to debug the problem you will
need to figure out where data plane packets from the eNodeB are dropped. One way to do this is to
run ``tcpdump` on (1) DATA_IFACE to ensure that the data plane packets are arriving, (2) the
access
interface to see that they make it to the UPF, and (3) the core
to check that they
are forwarded upstream.
If the upstream packets don’t make it to DATA_IFACE, you probably need to add the static route on the eNodeB so packets to the UPF have a next hop of DATA_IFACE. You can see these upstream packets by running:
tcpdump -i <data-iface> -n udp port 2152
If they don’t make it to access
you should check that the kernel routing table is forwarding
a packet with destination 192.158.252.3 to the access
interface. You can see them by running:
tcpdump -i access -n udp port 2152
In case packets are not forwarded from DATA_IFACE
to acccess
interface, the following command
can be used to forward the traffic which is destined to 192.168.252.3:
iptables -A FORWARD -d 192.168.252.3 -i <data-iface> -o access -j ACCEPT
If they don’t make it to core
then they are being dropped by the UPF for some reason. This
may be a configuration issue with the state loaded in the ROC / SD-CORE – the UPF is being told
to discard these packets. You should check that the device’s IMSI is part of a slice and that
the slice’s policy settings allow traffic to that destination. You can view them via the following:
tcpdump -i core -n net 172.250.0.0/16
That command will capture all packets to/from the UE subnet.
If you cannot figure out the issue, see Getting Help.
Getting Help
Please introduce yourself and post your questions to the #aether-dev channel on the ONF Community Slack. Details about how to join this channel can be found on the ONF Wiki. In your introduction please state your institution and position, and describe why you are interested in Aether and what is your end goal.
If you need help debugging your setup, please give as much detail as possible about your environment: the OS version you have installed, are you running on bare metal or in a VM, how much CPU and memory does your server have, are you installing behind a proxy, and so on. Also list the steps you have performed so far, and post any error messages you have received. These details will aid the community to understand where you are and how to help you make progress.