Fabric Switch Bootstrap (Beta)

Note

Fabric switches running the P4 UPF is a beta feature of the Aether 1.5 release, and the hardware and software setup is not required if using the BESS UPF.

The installation of the ONL OS image on the fabric switches uses the DHCP and HTTP server set up on the management server.

The default image is downloaded during that installation process by the onieboot role. Make changes to that roll and rerun the management playbook to download a newer switch image.

Preparation

The switches have a single ethernet port that is shared between OpenBMC and ONL. Find out the MAC addresses for both of these ports and enter it into NetBox.

Change boot mode to ONIE Rescue mode

In order to reinstall an ONL image, you must change the ONIE bootloader to “Rescue Mode”.

Once the switch is powered on, it should retrieve an IP address on the OpenBMC interface with DHCP. OpenBMC uses these default credentials:

username: root
password: 0penBmc

Login to OpenBMC with SSH:

$ ssh root@10.0.0.131
The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
ECDSA key fingerprint is SHA256:...
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
root@10.0.0.131's password:
root@bmc:~#

Using the Serial-over-LAN Console, enter ONL:

root@bmc:~# /usr/local/bin/sol.sh
You are in SOL session.
Use ctrl-x to quit.
-----------------------

root@onl:~#

Note

If sol.sh is unresponsive, please try to restart the mainboard with:

root@onl:~# wedge_power.sh reset

Change the boot mode to rescue mode with the command onl-onie-boot-mode rescue, and reboot:

root@onl:~# onl-onie-boot-mode rescue
[1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
[1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
The system will boot into ONIE rescue mode at the next restart.
root@onl:~# reboot

At this point, ONL will go through it’s shutdown sequence and ONIE will start. If it does not start right away, press the Enter/Return key a few times - it may show you a boot selection screen. Pick ONIE and Rescue if given a choice.

Installing an ONL image over HTTP

Now that the switch is in Rescue mode

First, activate the Console by pressing Enter:

discover: Rescue mode detected.  Installer disabled.

Please press Enter to activate this console.
To check the install status inspect /var/log/onie.log.
Try this:  tail -f /var/log/onie.log

** Rescue Mode Enabled **
ONIE:/ #

Then run the onie-nos-install command, with the URL of the management server on the management network segment:

ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
discover: Rescue mode detected. No discover stopped.
ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
Info: Fetching http://10.0.0.129/onie-installer ...
Connecting to 10.0.0.129 (10.0.0.129:80)
installer            100% |*******************************|   322M  0:00:00 ETA
ONIE: Executing installer: http://10.0.0.129/onie-installer
installer: computing checksum of original archive
installer: checksum is OK
...

The installation will now start, and then ONL will boot culminating in:

Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9

localhost login:

The default ONL login is:

username: root
password: onl

If you login, you can verify that the switch is getting it’s IP address via DHCP:

root@localhost:~# ip addr
...
3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
...

Post-ONL Configuration

A terraform user must be created on the switches to allow them to be configured.

This is done using Ansible. Verify that your inventory (Created earlier from the inventory/example-aether.ini file) includes an [aetherfabric] section that has all the names and IP addresses of the compute nodes in it.

Then run a ping test:

ansible -i inventory/sitename.ini -m ping aetherfabric

This may fail with the error:

"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."

Comment out the ansible_ssh_pass="onl" line, then rerun the ping test. It may ask you about authorized keys - answer yes for each host to trust the keys:

The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:...
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

Once you’ve trusted the host keys, the ping test should succeed:

spine1.role1.site | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
leaf1.role1.site | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
...

Then run the playbook to create the terraform user:

ansible-playbook -i inventory/sitename.ini playbooks/aetherfabric-playbook.yml

Once completed, the switch should now be ready for SD-Fabric runtime install.