SD-Fabric Deployment (Beta)

Note

SD-Fabric using the P4 UPF is a beta feature in the Aether 1.5 release, and the hardware and software setup is not required if using the BESS UPF.

Update aether-pod-configs

aether-pod-configs is a git project hosted on gerrit.opencord.org and we placed the following materials in it.

  • Terraform scripts to install SD-Fabric applications on Rancher, including ONOS, Stratum and Telegraf.

  • Customized configuration for each application (helm values).

  • Application specific configuration files, including ONOS network configuration and Stratum chassis config.

Here is an example folder structure:

╰─$ tree staging/stg-ace-menlo/sdfabric
staging/stg-ace-menlo/sdfabric
├── app_map.tfvars
├── backend.tf
├── main.tf -> ../../../common/sdfabric/main.tf
├── sdfabric
│   ├── app_map.tfvars
│   ├── backend.tf
│   ├── kubeconfig -> ../../../../common/sdfabric/apps/onos/kubeconfig
│   ├── leaf1-chassis-config.pb.txt
│   ├── leaf2-chassis-config.pb.txt
│   ├── main.tf -> ../../../../common/sdfabric/apps/sdfabric/main.tf
│   ├── sdfabric.yaml
│   ├── spine1-chassis-config.pb.txt
│   ├── spine2-chassis-config.pb.txt
│   └── variables.tf -> ../../../../common/sdfabric/apps/sdfabric/variables.tf
├── telegraf
│   ├── app_map.tfvars
│   ├── backend.tf
│   ├── main.tf -> ../../../../common/sdfabric/apps/telegraf/main.tf
│   ├── telegraf.yaml
│   └── variables.tf -> ../../../../common/sdfabric/apps/telegraf/variables.tf
└── variables.tf -> ../../../common/tost/variables.tf

There are three Terraform scripts inside sdfabric directory and are responsible for managing each service.

Root folder

Terraform reads app_map.tfvars to know which application will be installed on Rancher and which version and customized values need to apply to.

Here is the example of app_map.tfvars which defines prerequisite apps for SD-Fabric as well as project and namespace in which SD-fabric apps will be provisioned. Note that currently we don’t have any prerequisite so we left this blank intentionally. It can be used to specify prerequisites in the future.

project_name     = "sdfabric"
namespace_name   = "tost"

SD-FABRIC folder

All files under onos directory are related to ONOS application. The app_map.tfvars in this folder describes the information about ONOS helm chart.

In this example, we specify the onos-tost helm chart version to 0.1.18 and load onos.yaml as custom value files.

apps = ["onos"]
namespace_name = "tost"

app_map = {
   sdfabric = {
     app_name         = "onos-tost"
     repo_name        = "aether"
     chart_name       = "sdfabric"
     chart_version    = "1.0.7"
     values_yaml      = "sdfabric.yaml"
   }
}

sdfabric.yaml used to custom your sdfabric Helm chart values and please check SD-Fabric Helm chart to see how to configure it.

Once the Stratum is deployed to Kubernetes, it will read switch-dependent config files from the aether-pod-configs repo. The key folder(stratum.config.folder) indicates that relative path of configs.

Attention

The switch-dependent config file should be named as ${hostname}-chassis-config.pb.txt. For example, if the host name of your Tofino switch is my-leaf, please name config file my-leaf-config.pb.txt.

Telegraf folder

The app_map.tfvars specify the Helm Chart version and the filename of the custom Helm value file.

apps=["telegraf"]
namespace_name = "tost"
app_map = {
  telegraf = {
    app_name         = "telegraf"
    repo_name        = "aether"
    chart_name       = "tost-telegraf"
    chart_version    = "0.1.5"
    values_yaml      = "telegraf.yaml"
 }
}

The telegraf.yaml used to override the ONOS-Telegraf Helm Chart and its environment-dependent. Please pay attention to the inputs.addresses section. Telegraf will read data from stratum so we need to specify all Tofino switch’s IP addresses here. Taking Menlo staging pod as example, there are four switches so we fill out 4 IP addresses.

podAnnotations:
   field.cattle.io/workloadMetrics: '[{"path":"/metrics","port":9273,"schema":"HTTP"}]'

config:
   outputs:
      - prometheus_client:
         metric_version: 2
         listen: ":9273"
inputs:
   - cisco_telemetry_gnmi:
      addresses:
         - 10.92.1.81:9339
         - 10.92.1.82:9339
         - 10.92.1.83:9339
         - 10.92.1.84:9339
      redial: 10s
   - cisco_telemetry_gnmi.subscription:
      name: stratum_counters
      origin: openconfig-interfaces
      path: /interfaces/interface[name=*]/state/counters
      sample_interval: 5000ns
      subscription_mode: sample

Create Your Own Configs

The easiest way to create your own configs is running the template script.

Assumed we would like to set up the ace-example pod in the production environment.

  1. open the tools/ace_config.yaml (You should already have this file when you finish VPN bootstrap stage)

  2. fill out all required variables

  3. perform the makefile command to generate configuration and directory for SD-Fabric

  4. update onos.yaml for ONOS

  5. update ${hostname}-chassis-config.pb.txt for Stratum

  6. commit your change and open the Gerrit patch

  7. deploy your patch to ACE cluster and merge it after verifying the fabric connectivity

vim tools/ace_config.yaml
make -C tools sdfabric
vim production/ace-example/sdfabric/sdfabric/sdfabric.yaml
vim production/ace-example/sdfabric/sdfabric/*${hostname}-chassis-config.pb.txt**
git add commit
git review

Quick recap

To recap, most of the files in tost folder can be copied from existing examples. However, there are a few files we need to pay extra attentions to.

  • sdfabric.yaml in sdfabric folder

  • Chassis config in sdfabric folder There should be one chassis config for each switch. The file name needs to be ${hostname}-chassis-config.pb.txt

  • telegraf.yaml in telegraf folder need to be updated with all switch IP addresses

Double check these files and make sure they have been updated accordingly.

Create a review request

We also need to create a gerrit review request, similar to what we have done in the Aether Runtime Deployment.

Please refer to Aether Runtime Deployment to create a review request.

Deploy to ACE cluster

SD-Fabric is environment dependent application and you have to prepare correct configurations for both ONOS and Stratum to make it work.

A recommended approach is verifying your patch before merging it. You can type the comment apply-all in the Gerrit patch to trigger the deployment process, and then start to verify fabric connectivity.

Attention

Due to the limitation of Terraform’s dependent issue, you have to type the comment apply-all to trigger root folder’s Terraform script to setup project and namespace before merging the patch.

Check below section to learn more about how we setup the Jenkins job and how it works

Create SD-Fabric (named TOST in Jenkins) deployment job in Jenkins

There are three major components in the Jenkins system, the Jenkins pipeline and Jenkins Job Builder and Jenkins Job.

We follow the Infrastructure as Code principle to place three major components in a Git repo, aether-ci-management

Download the aether-ci-management repository.

$ cd $WORKDIR
$ git clone "ssh://[username]@gerrit.opencord.org:29418/aether-ci-management"

Here is the example of folder structure, we put everything related to three major components under the jjb folder.

$ tree -d jjb
jjb
├── ci-management
├── global
│   ├── jenkins-admin -> ../../global-jjb/jenkins-admin
│   ├── jenkins-init-scripts -> ../../global-jjb/jenkins-init-scripts
│   ├── jjb -> ../../global-jjb/jjb
│   └── shell -> ../../global-jjb/shell
├── pipeline
├── repos
├── shell
└── templates

Jenkins pipeline

Jenkins pipeline runs the Terraform scripts to install desired applications into the specified Kubernetes cluster.

Both ONOS and Stratum will read configuration files (network config, chassis config) from aether-pod-config.

The default git branch is master. For testing purpose, we also provide two parameters to specify the number of reviews and patchset.

We will explain more in the next section.

Note

Currently, we don’t perform the incremental upgrade for SD-Fabric application. Instead, we perform the clean installation. In the pipeline script, Terraform will destroy all existing resources and then create them again.

We put all pipeline scripts under the pipeline directory, the language of the pipeline script is groovy.

$ tree pipeline
pipeline
├── aether-in-a-box.groovy
├── artifact-release.groovy
├── cd-pipeline-charts-postrelease.groovy
├── cd-pipeline-dockerhub-postrelease.groovy
├── cd-pipeline-postrelease.groovy
├── cd-pipeline-terraform.groovy
├── docker-publish.groovy
├── ng40-func.groovy
├── ng40-scale.groovy
├── reuse-scan-gerrit.groovy
├── reuse-scan-github.groovy
├── tost-onos.groovy
├── tost-stratum.groovy
├── tost-telegraf.groovy
└── tost.groovy

Currently, we had five pipeline scripts for SD-Fabric deployment.

  1. tost.groovy

  2. sdfabric.groovy

  3. tost-telegraf.groovy

  4. tost-onos-debug.groovy

sdfabric.groovy and tost-telegraf.groovy are used to deploy the individual application respectively, and tost.groovy is a high level script, used to deploy whole SD-Fabric application, it will execute the above three scripts in its pipeline script.

tost-onos-debug.groovy is used to dump the debug information from the ONOS controller and it will be executed automatically when ONOS is deployed.

Jenkins jobs

Jenkins job is the task unit in the Jenkins system. A Jenkins job contains the following information:

  • Jenkins pipeline

  • Parameters for Jenkins pipeline

  • Build trigger

  • Source code management

We created one Jenkins job for each SD-Fabric component, per Aether edge.

We have four Jenkins jobs (HostPath provisioner, ONOS, Stratum and Telegraf) for each edge as of today.

There are 10+ parameters in Jenkins jobs and they can be divided into two parts, cluster-level and application-level.

Here is an example of supported parameters.

../_images/jenkins-onos-params.png

Application level

  • GERRIT_CHANGE_NUMBER/GERRIT_PATCHSET_NUMBER: tell the pipeline script to read the config for aether-pod-configs repo from a specified gerrit review, instead of the HEAD branch. It’s good for developer to test its change before merge.

  • onos_user: used to login ONOS controller

  • git_repo/git_server/git_user/git_password_env: information of git repository, git_password_env is a key for Jenkins Credential system.

Cluster level

  • gcp_credential: Google Cloud Platform credential for remote storage, used by Terraform.

  • terraform_dir: The root directory of the SD-Fabric directory.

  • rancher_cluster: target Rancher cluster name.

  • rancher_api_env: Rancher credential to access Rancher, used by Terraform.

Note

Typically, developer only focus on GERRIT_CHANGE_NUMBER and GERRIT_PATCHSET_NUMBER. The rest of them are managed by OPs.

Jenkins Job Builder (JJB)

We prefer to apply the IaC (Infrastructure as Code) for everything. We use the JJB (Jenkins Job Builder) to create new Jenkins Job, including the Jenkins pipeline. We need to clone a set of Jenkins jobs when a new edge is deployed.

In order to provide the flexibility and avoid re-inventing the wheel, we used the job template to declare your job. Thanks to the JJB, we can use the parameters in the job template to render different kinds of jobs easily.

All the template files are placed under templates directory.

╰─$ tree templates
templates
├── aether-in-a-box.yaml
├── archive-artifacts.yaml
├── artifact-release.yml
├── cd-pipeline-terraform.yaml
├── docker-publish-github.yaml
├── docker-publish.yaml
├── helm-lint.yaml
├── make-test.yaml
├── ng40-nightly.yaml
├── ng40-test.yaml
├── private-docker-publish.yaml
├── private-make-test.yaml
├── publish-helm-repo.yaml
├── reuse-gerrit.yaml
├── reuse-github.yaml
├── sync-dir.yaml
├── tost.yaml
├── verify-licensed.yaml
└── versioning.yaml

We defined all SD-Fabric required job templates in tost.yaml and here is its partial content.

- job-template:
   name: "{name}-onos"
   id: "deploy-onos"
   project-type: pipeline
   dsl: !include-raw-escape: jjb/pipeline/tost-onos.groovy
   triggers:
     - onf-infra-tost-gerrit-trigger:
        gerrit-server-name: '{gerrit-server-name}'
        trigger_command: "apply"
        pattern: "{terraform_dir}/tost/onos/.*"
   logrotate:
       daysToKeep: 7
       numToKeep: 10
       artifactDaysToKeep: 7
       artifactNumToKeep: 10
   parameters:
       - string:
             name: gcp_credential
             default: "{google_bucket_access}"
       - string:
             name: rancher_cluster
             default: "{rancher_cluster}"
       - string:
             name: rancher_api_env
             default: "{rancher_api}"
       - string:
             name: git_repo
             default: "aether-pod-configs"
       - string:
             name: git_server
             default: "gerrit.opencord.org"
       - string:
             name: git_ssh_user
             default: "jenkins"

Once we have the job template, we need to tell the JJB, we want to use the job template to create our own jobs. Here comes the concept of project, you need to define job templates you want to use and the values of all parameters.

We put all project yaml files under the repo directory and here is the example

╰─$ tree repos                                                                                                                                   130 ↵
repos
├── aether-helm-charts.yaml
├── aether-in-a-box.yaml
├── cd-pipeline-terraform.yaml
├── ng40-test.yaml
├── spgw.yaml
└── tost.yaml

Following is the example of tost projects, we defined three projects here, and each project has different parameters and Jenkins jobs it wants to use.

- project:
    name: deploy-tucson-pairedleaves-dev
    rancher_cluster: "dev-pairedleaves-tucson"
    terraform_dir: "staging/dev-pairedleaves-tucson"
    rancher_api: "{rancher_staging_access}"
    properties:
      - onf-infra-onfstaff-private
    jobs:
      - "deploy"
      - "deploy-onos"
      - "deploy-stratum"
      - "deploy-telegraf"
      - "debug-tost"

Create Your Own Jenkins Job

Basically, if you don’t need to customize the Jenkins pipeline script and the job configuration, the only thing you need to do is modify the repos/tost.yaml to add your project.

For example, we would like to deploy the SD-Fabric to our production pod, let’s assume it named “tost-example”. Add the following content into repos/tost.yaml

- project:
    name: deploy-tost-example-production
    rancher_cluster: "ace-test-example"
    terraform_dir: "production/tost-example"
    rancher_api: "{rancher_production_access}"
    disable-job: false
    need_stratum: false
    need_onos: false
    need_sdfabric: true
    debug_namespace: tost
    topology:
      - sdfabric
    properties:
      - onf-infra-onfstaff-private
    jobs:
      - "deploy"
          trigger_path: "sdfabric/.*
      - "deploy-sdfabric"
      - "deploy-telegraf"
      - "debug-tost"

Note

The terraform_dir indicates the directory location in aether-pod-configs repo, please ensure your Terraform scripts already there before running the Jenkins job.

Trigger SD-Fabric (named TOST in Jenkins) deployment in Jenkins

Whenever a change is merged into aether-pod-config, the Jenkins job should be triggered automatically to (re)deploy SD-Fabric (named TOST in Jenkins).

You can also type the comment apply in the Gerrit patch, it will trigger Jenkins jobs to deploy SD-Fabric for you.

Verification

Fabric connectivity should be fully ready at this point. We should verify that all servers, including compute nodes and the management server, have an IP address and are able to reach each other via fabric interface before continuing the next step.

This can be simply done by running a ping command from one server to another server’s fabric IP.

Disable deployment jobs

After verifying the SD-Fabric is ready, please submit another patch to disable the job.

$ cd $WORKDIR/aether-ci-management
$ vi jjb/repos/tost.yaml

# Add jobs for the new cluster
diff --git a/jjb/repos/tost.yaml b/jjb/repos/tost.yaml
index 19bade4..81b4ab1 100644
--- a/jjb/repos/tost.yaml
+++ b/jjb/repos/tost.yaml
@@ -478,7 +478,7 @@
     rancher_cluster: "ace-ntt"
     terraform_dir: "production/ace-ntt"
     rancher_api: "{rancher_production_access}"
-    disable-job: false
+    disable-job: true
     properties:
       - onf-infra-onfstaff-private
     jobs:

Troubleshooting

The deployment process involves the following steps:

  1. Jenkins Job

  2. Jenkins Pipeline

  3. Clone Git Repository

  4. Execute Terraform scripts

  5. Rancher start to install applications

  6. Applications be deployed into Kubernetes cluster

  7. ONOS/Stratum will read the configuration (network config, chassis config)

  8. Pod become running

Taking ONOS as an example, here’s what you can do to troubleshoot.

You can see the log message of the first 4 steps in Jenkins console. If something goes wrong, the status of the Jenkins job will be in red. If Jenkins doesn’t report any error message, the next step is going to Rancher’s portal to ensure the Answers is same as the onos.yaml in aether-pod-configs.

Accessing the Stratum CLI

You can login to the Stratum container running on a switch using this script:

#!/bin/bash
echo 'Attaching to Stratum container. Ctrl-P Ctrl-Q to exit'
echo 'Press Enter to continue...'
DOCKER_ID=`docker ps | grep stratum-bf | awk '{print $1}'`
docker attach $DOCKER_ID

You should then see the bf_sde prompt:

bf_sde> pm
bf_sde.pm> show -a

Accessing the ONOS CLI

After setting up kubectl to access the SD-Fabric pods, run:

$ kubectl get pods -n tost

Pick a SD-Fabric pod, and make a port forward to it, then login to it with the onos CLI tool:

$ kubectl -n tost port-forward onos-tost-onos-classic-0 8181 8101
$ onos karaf@localhost

In some rare cases, you may need to access the ONOS master instance CLI, in which case you can run roles:

karaf@root > roles
device:devswitch1: master=onos-tost-onos-classic-1, standbys=[ onos-tost-onos-classic-0 ]

Above lines show that onos-tost-onos-classic-1 is the master. So switch to that by killing the port forward, starting a new one pointing at the master, then logging into that one:

$ ps ax | grep -i kubectl
# returns kubectl commands running, pick the port-forward one and kill it
$ kill 0123
$ kubectl -n tost port-forward onos-tost-onos-classic-1 8181 8101
$ onos karaf@localhost