Troubleshooting
Firewalls and other host network issues
Unable to access a system
If it’s a system behind another system (ex: the compute nodes behind a
management server) and you’re trying to interactively login to it, make sure
that you’ve enabled SSH Agent Forwarding in your ~/.ssh/config
file:
Host mgmtserver1.prod.site.aetherproject.net
ForwardAgent yes
If you still have problems after verifying that this is set up, run ssh with
the -v
option, which will print out all the connection details and
whether an agent is used on the second ssh:
onfadmin@mgmtserver1:~$ ssh onfadmin@node2.mgmt.prod.site.aetherproject.net
debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
debug1: channel 1: new [authentication agent connection]
debug1: confirm auth-agent@openssh.com
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-56-generic x86_64)
...
onfadmin@node2:~$
Root/Public DNS port is blocked
In some cases access to the public DNS root and other servers is blocked, which prevents DNS queries from working within the pod.
To resolve this, forwarding addresses on the local network can be provided in
the Ansible YAML host_vars
file, using the unbound_forward_zones
list
to configure the Unbound recursive nameserver. An example:
unbound_forward_zones:
- name: "."
servers:
- "8.8.8.8"
- "8.8.4.4"
The items in the servers
list should be locally accessible nameservers.
Problems with OS installation
OS installs, but doesn’t boot
If you’ve completed the installation but the system won’t start the OS, check these BIOS settings:
If the startup disk is nVME, under
Advanced -> PCIe/PCI/PnP Configuration
the optionNVMe Firmware Source
should be set toAMI Native Support
, per Supermicro FAQ entry 28248.
Unknown MAC addresses
Sometimes it’s hard to find out all the MAC addresses assigned to network cards. These can be found in a variety of ways:
On servers, the BMC webpage will list the built-in network card MAC addresses.
If you login to a server,
ip link
orip addr
will show the MAC address of each interface, including on add-in cards.If you can login to a server but don’t know the BMC IP or MAC address for that server, you can find it with
sudo ipmitool lan print
.If you don’t have a login to the server, but can get to the management server,
ip neighbor
will show the arp table of MAC addresses known to that system. It’s output is unsorted -ip neigh | sort
is easier to read. This can be useful for determining if there’s a cabling problem - a device plugged into the wrong port of the management switch could show up in the DHCP pool range for a different segment.
Cabling issues
The system may not come up correctly if cabling isn’t connected properly. If you don’t have hands-on with the cabling, here are some ways to check on the cabling remotely:
On servers you can check which ports are connected with
ip link show
:$ ip link show ... 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff ... 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff
Ports that are up will show
state UP
You can determine which remote ports are connected with LLDP, assuming that the remote switch supports LLDP and has it enabled. This can be done with
networkctl lldp
, which shows both the name and the MAC address of the connected switch on a per-link basis:$ networkctl lldp LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10 eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1
Problems with ONIE Installation
Can’t reboot into ONL, loops on ONIE installer mode
Sometimes an ONL installation is incomplete or problematic, and reinstalling it doesn’t result in a working system.
If this is the case, reboot into ONIE Rescue mode and use parted
to delete
all the ONL-
prefixed partitions, then reinstall with an onie-installer
image.
Management Network Issues
Cycling PoE port power on a HP/Aruba Management switch
You may need to cycle the power on a port if an eNB or monitoring device that is powered the PoE switch is not responding or misbehaving.
To do this, login to the switch and check which ports are receiving power:
Aruba-2540-24G-PoEP-4SFPP# show power-over-ethernet brief
Status and Configuration Information
Available: 370 W Used: 11 W Remaining: 359 W
PoE Pwr Pwr Pre-std Alloc Alloc PSE Pwr PD Pwr PoE Port PLC PLC
Port Enab Priority Detect Cfg Actual Rsrvd Draw Status Cls Type
------ ---- -------- ------- ----- ------ ------- ------- ------------ --- ----
1 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
2 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
3 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
4 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
5 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
6 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
7 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
8 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
9 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
10 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
11 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
12 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
13 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
14 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
15 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
16 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
17 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
18 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
19 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
20 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
21 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
22 Yes low off usage usage 4.9 W 4.7 W Delivering 3 1
23 Yes low off usage usage 6.0 W 5.7 W Delivering 3 1
24 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
For this example, if we want to reset port 23, run these commands to disable the PoE power on the port:
Aruba-2540-24G-PoEP-4SFPP# config
Aruba-2540-24G-PoEP-4SFPP(config)# interface 23
Aruba-2540-24G-PoEP-4SFPP(eth-23)# no power-over-ethernet
Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
Status and Configuration Information for port 23
Power Enable : No PoE Port Status : Disabled
PLC Class/Type : 0/- Priority Config : low
DLC Class/Type : 0/- Pre-std Detect : off
Alloc By Config : usage Configured Type :
Alloc By Actual : usage PoE Value Config : n/a
PoE Counter Information
Over Current Cnt : 0 MPS Absent Cnt : 0
Power Denied Cnt : 0 Short Cnt : 0
LLDP Information
PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
MED LLDP Detect : Disabled PD TLV Sent Type : n/a
Power Information
PSE Voltage : 0.0 V PSE Reserved Power : 0.0 W
PD Amperage Draw : 0 mA PD Power Draw : 0.0 W
At this point, the power has been removed from the device. To reenable it:
Aruba-2540-24G-PoEP-4SFPP(eth-23)# power-over-ethernet
Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
Status and Configuration Information for port 23
Power Enable : Yes PoE Port Status : Delivering
PLC Class/Type : 3/1 Priority Config : low
DLC Class/Type : 0/- Pre-std Detect : off
Alloc By Config : usage Configured Type :
Alloc By Actual : usage PoE Value Config : n/a
PoE Counter Information
Over Current Cnt : 0 MPS Absent Cnt : 0
Power Denied Cnt : 0 Short Cnt : 0
LLDP Information
PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
MED LLDP Detect : Disabled PD TLV Sent Type : n/a
Power Information
PSE Voltage : 0.0 V PSE Reserved Power : 0.1 W
PD Amperage Draw : 18 mA PD Power Draw : 0.0 W
Refer to command's help option for field definitions
Aruba-2540-24G-PoEP-4SFPP(eth-23)# exit
Aruba-2540-24G-PoEP-4SFPP(config)# exit