.. SPDX-FileCopyrightText: © 2020 Open Networking Foundation SPDX-License-Identifier: Apache-2.0 SD-Fabric Deployment ==================== Update aether-pod-configs ------------------------- ``aether-app-configs`` is a git project hosted on **gerrit.opencord.org** and we placed the following materials in it. - Rancher Fleet's configuration to install SD-Fabric applications on Rancher, including ONOS, Stratum, Telegraf and PFCP-Agent. - Customized configuration for each application (helm values). - Application specific configuration files, including ONOS network configuration and Stratum chassis config. Here is an example folder structure: .. code-block:: bash ╰─$ tree aether-dev/app/onos aether-dev/app/stratum aether-dev/app/pfcp-agent aether-dev/app/telegraf ├── fleet.yaml ├── kustomization.yaml ├── overlays │   ├── dev-pairedleaves-tucson │   │   └── values.yaml │   ├── dev-pdp-menlo │   │   └── values.yaml │   └── dev-sdfabric-menlo │   └── values.yaml └── registry-sealed-secret.yaml aether-dev/app/stratum ├── fleet.yaml └── overlays ├── dev-pairedleaves-tucson │   ├── kustomization.yaml │   ├── leaf1 │   ├── leaf2 │   ├── qos-config-leaf1.yaml │   ├── qos-config-leaf2.yaml │   └── values.yaml └── dev-sdfabric-menlo ├── kustomization.yaml ├── menlo-sdfabric-leaf1 ├── menlo-sdfabric-leaf2 └── values.yaml aether-dev/app/pfcp-agent ├── fleet.yaml └── overlays ├── dev-pairedleaves-tucson │   └── values.yaml └── dev-sdfabric-menlo └── values.yaml aether-dev/app/telegraf ├── fleet.yaml └── overlays ├── dev-pairedleaves-tucson │   └── values.yaml └── dev-sdfabric-menlo └── values.yaml App folder """""""""" Rancher Fleet reads ``fleet.yaml`` to know where to download the Helm Chart manifest and how to customize the deployment for each target clusters. Here is the example of ``fleet.yaml`` which downloads SD-Fabric(1.0.18) Helm Chart from **https://charts.aetherproject.org** and then use the **overlays/$cluster_name/values.yaml** to customize each cluster. .. code-block:: YAML # SPDX-FileCopyrightText: 2021-present Open Networking Foundation defaultNamespace: tost helm: releaseName: sdfabric repo: https://charts.aetherproject.org chart: sdfabric version: 1.0.18 helm: values: import: stratum: enabled: false targetCustomizations: - name: dev-sdfabric-menlo clusterSelector: matchLabels: management.cattle.io/cluster-display-name: dev-sdfabric-menlo helm: valuesFiles: - overlays/dev-sdfabric-menlo/values.yaml - name: dev-pairedleaves-tucson clusterSelector: matchLabels: management.cattle.io/cluster-display-name: dev-pairedleaves-tucson helm: valuesFiles: - overlays/dev-pairedleaves-tucson/values.yaml - name: dev-pdp-menlo clusterSelector: matchLabels: management.cattle.io/cluster-display-name: dev-pdp-menlo helm: valuesFiles: - overlays/dev-pdp-menlo/values.yaml **values.yaml** used to custom your sdfabric Helm chart values and please check `SD-Fabric Helm chart `_ to see how to configure it. ONOS App """""""" For the ONOS application, the most import configuration is network configuration (netcfg) which is environment-dependent configuration and you should configure it properly. netcfg is configured in the Helm Value files and please check the following example. .. code-block:: bash ╰─$ cat aether-app-configs/aether-dev/app/onos/overlays/dev-sdfabric-menlo/values.yaml 130 ↵ # SPDX-FileCopyrightText: 2020-present Open Networking Foundation # Value file for SDFabric helm chart. ... onos-classic: config: componentConfig: "org.onosproject.net.host.impl.HostManager": > { "monitorHosts": "true", "probeRate": "10000" } "org.onosproject.provider.general.device.impl.GeneralDeviceProvider": > { "readPortId": true } netcfg: > { ..... } Please check `SD-Fabric Configuration Guide `_ to learn more about network configuration. Stratum App """"""""""" Stratum reads the chassis config from the Kubernetes configmap resource but it doesn't support the function to dynamically reload the chassis config, which means we have to restart the Stratum pod every time when we update the chassis config. In order to solve this problem without modifying Stratum's source code, we have introduced the Kustomize to the deployment process. Kustomize supports the function called configMapGenerator which generates the configmap with a hash suffix in its name and then inject this hash-based name to the spec section of Stratum YAML file. See the following example, you can see the configmap name isn't fixed. .. code-block: bash ╰─$ kc -n tost get daemonset stratum -o json | jq '.spec.template.spec.volumes | .[] | select(.name == "chassis-config")' { "configMap": { "defaultMode": 484, "name": "stratum-chassis-configs-7t6tt25654" }, "name": "chassis-config" } From the view of the Kubernetes, when it notices the spec of the YAML file is changed, it will redeploy whole Stratum application, which means Stratum will read the updated chassis config eventually. .. code-block:: bash ╰─$ tree aether-dev/app/stratum ├── fleet.yaml └── overlays ├── dev-pairedleaves-tucson │   ├── kustomization.yaml │   ├── leaf1 │   ├── leaf2 │   ├── qos-config-leaf1.yaml │   ├── qos-config-leaf2.yaml │   └── values.yaml └── dev-sdfabric-menlo ├── kustomization.yaml ├── menlo-sdfabric-leaf1 ├── menlo-sdfabric-leaf2 └── values.yaml ╰─$ cat aether-dev/app/stratum/overlays/dev-pairedleaves-tucson/kustomization.yaml # SPDX-FileCopyrightText: 2021-present Open Networking Foundation configMapGenerator: - name: stratum-chassis-configs files: - leaf1 - leaf2 .. Check `SD-Fabric Doc `_ to learn how to write the chassis config and don't forget to add the file name into the kustomization.yaml file once you set up your chassis config. .. attention:: The switch-dependent config file should be named as **${hostname}**. For example, if the host name of your Tofino switch is **my-leaf**, please name config file **my-leaf**. .. TODO: Add an example based on the recommended topology Telegraf App """""""""""" Below is the example directory structure of Telegraf application. .. code-block:: ╰─$ tree aether-dev/app/telegraf 255 ↵ aether-dev/app/telegraf ├── fleet.yaml └── overlays ├── dev-pairedleaves-tucson │   └── values.yaml └── dev-sdfabric-menlo └── values.yaml The **values.yaml** used to override the ONOS-Telegraf Helm Chart and its environment-dependent. Please pay attention to the **inputs.addresses** section. Telegraf will read data from stratum so we need to specify all Tofino switch’s IP addresses here. Taking Menlo staging pod as example, there are four switches so we fill out 4 IP addresses. .. code-block:: yaml config: outputs: - prometheus_client: metric_version: 2 listen: ":9273" inputs: - cisco_telemetry_gnmi: addresses: - 10.92.1.81:9339 - 10.92.1.82:9339 - 10.92.1.83:9339 - 10.92.1.84:9339 redial: 10s - cisco_telemetry_gnmi.subscription: name: stratum_counters origin: openconfig-interfaces path: /interfaces/interface[name=*]/state/counters sample_interval: 5000ns subscription_mode: sample Create Your Own Configs """"""""""""""""""""""" Assume we would like to deploy the SD-Fabric to the ace-example cluster in the development environment. 1. Modify the fleet.yaml to customize your cluster with specific value file. 2. Add your Helm Values into the overlays folder. 3. Have to add the chassis config file into the kustomization.yaml for Stratum application. .. code-block:: console ╰─$ git st On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git restore --staged ..." to unstage) modified: aether-dev/app/onos/fleet.yaml new file: aether-dev/app/onos/overlays/dev-my-cluster/values.yaml modified: aether-dev/app/stratum/fleet.yaml new file: aether-dev/app/stratum/overlays/dev-my-cluster/kustomization.yaml new file: aether-dev/app/stratum/overlays/dev-my-cluster/menlo-sdfabric-leaf1 new file: aether-dev/app/stratum/overlays/dev-my-cluster/menlo-sdfabric-leaf2 new file: aether-dev/app/stratum/overlays/dev-my-cluster/values.yaml modified: aether-dev/app/telegraf/fleet.yaml new file: aether-dev/app/telegraf/overlays/dev-my-cluster/values.yaml Quick recap """"""""""" To recap, most of the files in **app** folder can be copied from existing examples. However, there are a few files we need to pay extra attentions to. - ``fleet.yaml`` in each app folder - Chassis config in **app/stratum/overlays/$cluster_name/** folder There should be one chassis config for each switch. The file name needs to be **${hostname}** - **values.yaml** in **telegraf** folder need to be updated with all switch IP addresses Double check these files and make sure they have been updated accordingly. Create a review request """"""""""""""""""""""" We also need to create a gerrit review request, similar to what we have done in the **Aether Runtime Deployment**. Please refer to :doc:`Aether Runtime Deployment ` to create a review request. Deploy to ACE cluster """"""""""""""""""""" SD-Fabric is environment dependent application and you have to prepare correct configurations for both ONOS and Stratum to make it work. Check below section to learn more about how we setup the Jenkins job and how it works Create SD-Fabric deployment job in Jenkins ------------------------------------------ We have been using the Rancher Fleet to deploy SD-Fabric as the GitOps approach which means every change we push to the Git repo will be synced to the target cluster automatically. However, ONOS doesn't support the incremental upgrade which means we have to delete all ONOS instance and then create all instance again every time we want to upgrade ONOS application. Rancher Fleet doesn't support the full recreation during the Application upgrade and that's reason we have created a Jenkins job to recreate the ONOSs application. You have to add the Jenkins job for new cluster by modifying ``aether-ci-management`` Download the ``aether-ci-management`` repository. .. code-block:: shell $ cd $WORKDIR $ git clone "ssh://[username]@gerrit.opencord.org:29418/aether-ci-management" Create Your Own Jenkins Job """"""""""""""""""""""""""" Modify jjb/repos/sdfabric.yaml to add your cluster. For example, we want to deploy the SD-Fabric to our new cluster **my-cluster** which is on the staging environment. Add the following content into jjb/repo/sdfabric.yaml. .. code-block:: yaml --- a/jjb/repos/sdfabric.yaml +++ b/jjb/repos/sdfabric.yaml @@ -50,6 +50,17 @@ - "deploy-sdfabric-app": - "deploy-debug" +- project: + name: my-cluster + disable-job: false + fleet-workspace: 'aether-dev' + properties: + - onf-infra-onfstaff-private + jobs: + - "deploy-sdfabric-app": + - "deploy-debug" + + If your cluster is on the production environment, you have to change both **terraform_env** and **fleet-workspace** Trigger SD-Fabric deployment in Jenkins --------------------------------------------------------------- Whenever a change is merged into **aether-app-config**, the Jenkins job should be triggered automatically to (re)deploy SD-Fabric . You can also manually trigger the job to redeploy SD-Fabric if needed and below is an example of default parameters when you run the job. .. image:: images/jenkins-sdfabric-params.png :width: 480px If you want to capture all SD-Fabric related containers logs before redeploying them, please enable ``POD_LOG`` option. The Jenkins job helps to redeploy ONOS, Stratum and PFCP-Agent application and the default options is ONOS and Stratum, you can redeploy what you want by click those ``REDEPLOY_XXXX`` options. Verification ------------ Fabric connectivity should be fully ready at this point. We should verify that **all servers**, including compute nodes and the management router, have an IP address and are **able to reach each other via fabric interface** before continuing the next step. This can be simply done by running a **ping** command from one server to another server's fabric IP. Troubleshooting --------------- The deployment process involves the following steps: 1. Jenkins Job (For ONOS Only) 2. Rancher Fleet upgrade application based on Git change 3. Applications be deployed into Kubernetes cluster 4. ONOS/Stratum will read the configuration (network config, chassis config) 5. Pod become running Taking ONOS as an example, here's what you can do to troubleshoot. You can see the log message of the first step in Jenkins console. If something goes wrong, the status of the Jenkins job will be in red. If Jenkins doesn't report any error message, the next step is going to Rancher Fleet's portal to ensure Fleet works as expected. Accessing the Stratum CLI """"""""""""""""""""""""" You can login to the Stratum container running on a switch using this script: .. code-block:: sh #!/bin/bash echo 'Attaching to Stratum container. Ctrl-P Ctrl-Q to exit' echo 'Press Enter to continue...' DOCKER_ID=`docker ps | grep stratum-bf | awk '{print $1}'` docker attach $DOCKER_ID You should then see the ``bf_sde`` prompt: .. code-block:: sh bf_sde> pm bf_sde.pm> show -a Accessing the ONOS CLI """""""""""""""""""""" After setting up kubectl to access the SD-Fabric pods, run: .. code-block:: sh $ kubectl get pods -n tost Pick a SD-Fabric pod, and make a port forward to it, then login to it with the ``onos`` CLI tool: .. code-block:: sh $ kubectl -n tost port-forward onos-tost-onos-classic-0 8181 8101 $ onos karaf@localhost In some rare cases, you may need to access the ONOS master instance CLI, in which case you can run ``roles``: .. code-block:: sh karaf@root > roles device:devswitch1: master=onos-tost-onos-classic-1, standbys=[ onos-tost-onos-classic-0 ] Above lines show that ``onos-tost-onos-classic-1`` is the master. So switch to that by killing the port forward, starting a new one pointing at the master, then logging into that one: .. code-block:: sh $ ps ax | grep -i kubectl # returns kubectl commands running, pick the port-forward one and kill it $ kill 0123 $ kubectl -n tost port-forward onos-tost-onos-classic-1 8181 8101 $ onos karaf@localhost