Friday, November 15, 2019

Project Velero 1.2 on OpenShift 3.11 - Backup and Restore

Today, I will address a common need on any mission-critical OpenShift on-premises environment: how to take the backup of etcd objects and be able to restore them.

For such, we will use Project Velero 1.2 on an on-premises Red Hat OpenShift 3.11 cluster (which means... no AWS S3 or any cloud provider object storage).

I used an openshift blog from March/2019 as a starting point, but this blog post will give you step-by-step to use Velero 1.2 (released in Nov/2019) + OpenShift 3.11 + Minio + NFS.

My OpenShift 3.11 cluster has access to the internet, so it will be an online installation.  I will provide a list of the images by the bottom of this page so you can download these images from public registries and upload them onto your internal registry. In this case, of course, you will have to tweak the Velero yaml files to point to your corporate registry.


Step 1) Setup initial minio object storage (without persistence)


First off, Velero needs an object storage system for the backup. The natural choice is minio, which is the most common on-premises S3-compatible solution. Minio’s setup steps come with Velero.

On the OpenShift admin node, download and extract Velero 1.2 on the admin node. Copy "velero" client under /usr/local/bin.

[root@ocp-admin]# cd /root
[root@ocp-admin]# wget https://github.com/vmware-tanzu/velero/releases/download/v1.2.0/velero-v1.2.0-linux-amd64.tar.gz
Resolving github.com (github.com)... 192.30.253.112
(...)[=================================================================================================================================>] 22,410,133 1.85MB/s in 8.2s2019-11-12 11:54:41 (2.62 MB/s) - ‘velero-v1.2.0-linux-amd64.tar.gz’ saved [22410133/22410133]

[root@ocp-admin]# tar -xvzf velero-v1.2.0-linux-amd64.tar.gz
velero-v1.2.0-linux-amd64/LICENSE
velero-v1.2.0-linux-amd64/examples/README.md
(...)

[root@ocp-admin]# cp velero-v1.2.0-linux-amd64/velero /usr/local/bin/


The first setup of minio is without persistence,  in order words, we are installing “as is” with emptyDir and default credentials (which is minio/minio123):

[root@ocp-admin]# oc apply -f /root/velero-v1.2.0-linux-amd64/examples/minio/00-minio-deployment.yaml
namespace/velero configured
deployment.apps/minio created
service/minio created
job.batch/minio-setup created

Check if the minio pod is running after the installation:

[root@ocp-admin]# oc project velero
[root@ocp-admin]# oc get pods
NAME                     READY     STATUS      RESTARTS   AGE
minio-6b8ff5c8b6-r5mf8   1/1       Running     0          45s
minio-setup-867lc        0/1       Completed   0          45s

Look up the ClusterIP and port of the minio service:

[root@ocp-admin]# oc get services
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
minio     ClusterIP   10.144.222.38   <none>        9000/TCP   50s

Let's check if minio is responding to the port 9000. For such, we will ssh to one of the OCP master nodes (or any other node) run a curl to the ClusterIP:port for a response:

[root@ocp-admin]# ssh ocp-master # or any other node of OCP cluster
[root@ocp-master]# curl http://10.144.222.38:9000
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied.
(...)

Great. The minio service is responsive. Let's now download the standalone docker container with the minio client “mc” and poke around. Remember, this is a standalone docker container and it does not translate service names, hence we need to use the actual ClusterIP:

[root@ocp-master]#  docker pull minio/mc
Using default tag: latest
Trying to pull repository registry.redhat.io/minio/mc ...
Trying to pull repository docker.io/minio/mc ...
latest: Pulling from docker.io/minio/mc
9d48c3bd43c5: Pull complete
4b37be9af47f: Pull complete
Digest: sha256:f3803011635341a92d32d35c504cc8eaffb99fa696fab48f1c20db84d13ab186
Status: Downloaded newer image for docker.io/minio/mc:latest

[root@ocp-master]# docker run -it --entrypoint=/bin/sh minio/mc
/ # mc config host add s3 http://10.144.222.38:9000 minio minio123
mc: Configuration written to `/root/.mc/config.json`. Please update your access credentials.
mc: Successfully created `/root/.mc/share`.
mc: Initialized share uploads `/root/.mc/share/uploads.json` file.
mc: Initialized share downloads `/root/.mc/share/downloads.json` file.

Added `s3` successfully.

As we can see, there is a default bucket called "velero" and it is empty:

/ # mc tree s3
s3
└─ velero


Let's go to the next steps to persist the minio data.

Step 2) (Skip if you have already an NFS mount point) Setup an NFS export on the admin node.


For this example, I will use the OCP admin node as the NFS server but you can use any NFS server.
My admin node has the following default exports under /etc/exports.d as installed by OpenShift 3.11 ansible:

[root@ocp-admin]# cat /etc/exports.d/openshift-ansible.exports
"/exports/registry" *(rw,root_squash)
"/exports/metrics" *(rw,root_squash)
"/exports/logging" *(rw,root_squash)
"/exports/logging-es-ops" *(rw,root_squash)
"/exports/etcd" *(rw,root_squash)

[root@ocp-admin]# ls -l /exports
total 0
drwxrwxrwx. 2 nfsnobody nfsnobody  6 Oct 17 08:05 etcd
drwxrwxrwx. 2 nfsnobody nfsnobody  6 Oct 17 08:05 logging
drwxrwxrwx. 2 nfsnobody nfsnobody  6 Oct 17 08:05 logging-es-ops
drwxrwxrwx. 4 nfsnobody nfsnobody 61 Oct 17 08:34 metrics
drwxrwxrwx. 2 nfsnobody nfsnobody  6 Oct 17 08:05 registry


I will create an export called "minio-storage" and configure NFS through a file called "minio.exports" and restart the NFS server.

[root@ocp-admin]# mkdir -p /exports/minio-storage; chown nfsnobody:nfsnobody /exports/minio-storage; chmod 777 /exports/minio-storage
[root@ocp-admin]# vi /etc/exports.d/minio.exports
[root@ocp-admin]# cat /etc/exports.d/minio.exports
"/exports/minio-storage" *(rw,root_squash)
[root@ocp-admin]#  service nfs restart


Let's check if the NFS export is mountable:

[root@ocp-admin]# mkdir -p /mnt/minio-storage
[root@ocp-admin]# mount ocp-admin:/exports/minio-storage /mnt/minio-storage
[root@ocp-admin]# mount | grep nfs | grep minio
ocp-admin:/exports/minio-storage on /mnt/minio-storage type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.1.1)

Our NFS mount point is good to go.

Step 3) Configure minio to use NFS mount point for persistence (via PV/PVC)


We will need the PV and PVC objects to point to our NFS mount point. The PVC object must be under “velero” project. Here is the example yaml file. You will need to change according to your NFS server / NFS export:

[root@ocp-admin]# cd /root
[root@ocp-admin]# cat minio-volume.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  finalizers:
  - kubernetes.io/pv-protection
  name: minio-pv
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 15Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: minio-pv-claim
    namespace: velero
  nfs:
    path: /exports/minio-storage
    server: ocp-admin
  persistentVolumeReclaimPolicy: Retain

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pv-claim
  namespace: velero
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 15Gi
  volumeName: minio-pv

[root@ocp-admin]# oc apply -f minio-volume.yaml
persistentvolume/minio-pv created
persistentvolumeclaim/minio-pv-claim created

Before reconfiguring minio to the PVC, make sure the PVC is bound to the PV (otherwise you will break your minio setup):

[root@ocp-admin]# oc get pvc
NAME             STATUS    VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
minio-pv-claim   Bound     minio-pv   15Gi       RWO                           1d

This is the moment of truth. Let's reconfigure minio /storage volume to use the pvc "minio-pv-claim" using the command "oc set". The minio pod is restarted and we can use "oc get pods -w" to watch it:

[root@ocp-admin]# oc set volume deployment.apps/minio --add --name=storage -t pvc --claim-name=minio-pv-claim --overwrite
deployment.apps/minio volume updated

[root@ocp-admin]# oc get pods -w
NAME                     READY     STATUS        RESTARTS   AGE
minio-6b8ff5c8b6-r5mf8   0/1       Terminating   0          3m
minio-setup-867lc        0/1       Completed     0          3m
minio-6b8ff5c8b6-r5mf8   0/1       Terminating   0         3m
minio-6b8ff5c8b6-r5mf8   0/1       Terminating   0         3m
minio-68cbfb4c89-8rbjb   0/1       Pending   0         15s
minio-68cbfb4c89-8rbjb   0/1       Pending   0         15s
minio-68cbfb4c89-8rbjb   0/1       ContainerCreating   0         15s
minio-68cbfb4c89-8rbjb   1/1       Running   0         18s

Now minio is using the PVC to persist the data. However, the default "velero" bucket is gone, as we  can see going back to the mc command line on the master node:

[root@ocp-master]# docker run -it --entrypoint=/bin/sh minio/mc
/ # mc config host add s3 http://10.144.222.38:9000 minio minio123
/ # mc tree s3
mc: <ERROR> Unable to tree `s3`. Bucket name cannot be empty.


We need to recreate the default "velero" bucket, otherwise, our installation of velero will fail. You can create the bucket using the same minio/mc container running on the master:

/ # mc mb s3/velero
Bucket created successfully `s3/velero`.
/ # mc tree s3
s3
└─ velero
/ # mc ls s3
[2019-11-14 16:22:04 UTC]      0B velero/


Finally, let's take a look at the mount point of the export on the admin node. We do see a "velero" subdirectory and other hidden files on the NFS mount point:

[root@ocp-admin]# pwd
/mnt/minio-storage
[root@ocp-admin]# ls -l
total 0
drwxr-xr-x. 2 nfsnobody nfsnobody 6 Nov 14 09:22 velero
[root@ocp-admin]# find .
.
./.minio.sys
./.minio.sys/tmp
./.minio.sys/tmp/2e009f02-4d51-4ac3-84f6-90072d788aa2
./.minio.sys/format.json
./.minio.sys/multipart
./.minio.sys/config
./.minio.sys/config/config.json
./.minio.sys/config/iam
./.minio.sys/config/iam/format.json
./velero

We are ready to install velero with data persistence.

Step 4) Setup Velero


I have run the steps from https://velero.io/docs/master/contributions/minio/ .

Create the credential file "credentials-velero". The credentials must match the minio credentials you created on Step1.

[root@ocp3-admin-01 ~]# cat credentials-velero
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123

Run the velero installation.  ATTENTION: as of the writing of this post, the velero install command line from the velero installation page is missing the required "--plugins" parameter (in red below).  It points to the plugin image and for minio, the plugin is "velero-plugin-for-aws":

[root@ocp-admin]# velero install \
--provider aws --bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000 \
--plugins velero/velero-plugin-for-aws:v1.0.0
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: created
(...)
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

You can run "kubectl logs" to check the velero installation. If you do not see errors is because it installed correctly:

[root@ocp-admin]# kubectl logs deployment/velero -n velero
time="2019-11-14T16:54:27Z" level=info msg="setting log-level to INFO" logSource="pkg/cmd/server/server.go:171"
time="2019-11-14T16:54:27Z" level=info msg="Starting Velero server v1.2.0 (5d008491bbf681658d3e372da1a9d3a21ca4c03c)" logSource="pkg/cmd/server/server.go:173"
time="2019-11-14T16:54:27Z" level=info msg="No feature flags enabled" logSource="pkg/cmd/server/server.go:177"
(...)

Check the velero pod running:
[root@ocp3-admin-01 ~]# oc get pods
NAME                      READY     STATUS      RESTARTS   AGE
minio-68cbfb4c89-8rbjb    1/1       Running     0          1h
minio-setup-867lc         0/1       Completed   0          1h
velero-77b4587448-gfhrx   1/1       Running     0          2m

You are ready for the first backup.

Step 5) Running a backup


Let's use the application example that comes with velero for the backup. You can install it running the following:

[root@ocp-admin ~]# oc apply -f /root/velero-v1.2.0-linux-amd64/examples/nginx-app/base.yaml
namespace/nginx-example created
deployment.apps/nginx-deployment created
service/my-nginx created

Run the velero backup command. On the example below, you are only backing up the objects that have nginx label:

[root@ocp-admin]# velero backup create nginx-backup --selector app=nginx
Backup request "nginx-backup" submitted successfully.
Run `velero backup describe nginx-backup` or `velero backup logs nginx-backup` for more details.

Let's run the describe command to check the status of the backup. For big backups, this command will report is the operation is still in progress. On the output below, the backup has completed:

[root@ocp-admin]# velero backup describe nginx-backup
Name:         nginx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=nginx

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2019-11-14 10:04:42 -0700 MST
Completed:  2019-11-14 10:04:48 -0700 MST

Expiration:  2019-12-14 10:04:42 -0700 MST

Persistent Volumes: <none included>


I have been curious how velero does storage the backup, so let's look at the NFS mount point and check the data in there:

[root@ocp-admin]# cd /mnt/minio-storage/velero/
[root@ocp-admin]# cd backups/nginx-backup/
[root@ocp-admin]# ls -l
total 24
-rw-r--r--. 1 nfsnobody nfsnobody 3605 Nov 14 10:05 nginx-backup-logs.gz
-rw-r--r--. 1 nfsnobody nfsnobody   29 Nov 14 10:05 nginx-backup-podvolumebackups.json.gz
-rw-r--r--. 1 nfsnobody nfsnobody  181 Nov 14 10:05 nginx-backup-resource-list.json.gz
-rw-r--r--. 1 nfsnobody nfsnobody 2315 Nov 14 10:05 nginx-backup.tar.gz
-rw-r--r--. 1 nfsnobody nfsnobody   29 Nov 14 10:05 nginx-backup-volumesnapshots.json.gz
-rw-r--r--. 1 nfsnobody nfsnobody  862 Nov 14 10:05 velero-backup.json

So, for each backup, there is a subdirectory with compressed files. Looking at the compressed tarball file, we can see all Kubernetes objects in JSON:

[root@ocp-admin]# tar -xvzf nginx-backup.tar.gz
metadata/version
resources/pods/namespaces/nginx-example/nginx-deployment-67594d6bf6-2c7hq.json
resources/pods/namespaces/nginx-example/nginx-deployment-67594d6bf6-4d88k.json
resources/services/namespaces/nginx-example/my-nginx.json
resources/endpoints/namespaces/nginx-example/my-nginx.json
resources/namespaces/cluster/nginx-example.json
resources/replicasets.apps/namespaces/nginx-example/nginx-deployment-67594d6bf6.json
resources/projects.project.openshift.io/cluster/nginx-example.json

Step 6) Running a restore


Finally, let's test a restore. First, let's check the nginx-example project and its pods:

[root@ocp-admin]# oc project nginx-example
Already on project "nginx-example" on server "https://ocp3.gsslab.local:8443".

[root@ocp-admin]# oc get pods
nginx-deployment-67594d6bf6-2c7hq 1/1 Running 0 1m
nginx-deployment-67594d6bf6-4d88k 1/1 Running 0 1m


Deleting the entire project:

[root@ocp-admin]# oc delete project nginx-example

project.project.openshift.io "nginx-example" deleted
[root@ocp-admin]# oc project
error: the project "nginx-example" specified in your config does not exist.


Restoring the project via "velero restore" with the name of backup as a parameter (the name of the backup from the previous step):

[root@ocp-admin]# velero restore create --from-backup nginx-backup
Restore request "nginx-backup-20191114150754" submitted successfully.
Run `velero restore describe nginx-backup-20191114150754` or `velero restore logs nginx-backup-20191114150754` for more details.

[root@ocp-admin]# velero restore get
NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR
nginx-backup-20191114150754 nginx-backup Completed 0 0 2019-11-14 15:08:10 -0700 MST <none>


The project has been restored, even with the same pods and original names:

[root@ocp-admin]# oc project
Using project "nginx-example" on server
"https://ocp3.gsslab.local:8443".

[root@ocp-admin]# oc get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-67594d6bf6-2c7hq 1/1 Running 0 43s
nginx-deployment-67594d6bf6-4d88k 1/1 Running 0 43s


Conclusion


I have demonstrated that Project Velero 1.2 is functional with OpenShift 3.11 using NFS as data persistence.

If you need to install Velero 1.2 disconnected from the internet, you will need to have the following images on your corporate internal registry:


[root@ocp-app]# docker images | egrep "velero|minio"
docker.io/velero/velero                                          v1.2.0              255525afb00f        8 days ago          148 MB
docker.io/velero/velero-plugin-for-aws                           v1.0.0              f9c43a2d79a8        8 days ago          106 MB
docker.io/minio/minio                                            latest              8869bca0366f        4 weeks ago         51 MB
docker.io/minio/mc                                               latest              f4f9de663a7f        5 weeks ago         22.7 MB

No comments:

Post a Comment

VMworld Europe 2019 Session on Deployment of OpenShift using VMware Cloud Assembly

This is the talk (CODE3455E) that I gave with a VMware colleague ( https://twitter.com/vMichaelPatton1 ) at VMworld Barcelona in November/20...