Backup and Restore Kubernetes ETCD

ยท

5 min read

Backup and Restore Kubernetes ETCD

After reading this post you will be understanding the purpose of ETCD, how to take backup of ETCD and how to restore ETCD

If you have not yet checked the previous parts of this series, please go ahead and check this ๐Ÿ‘‰ Link

Backup and Restore ETCD

What is etcd?

ETCD is a key-value store used as a backing store for all Kubernetes cluster data. When we talk about cluster data we are more specifically talking about the cluster state and the configuration of the cluster, deployments, pod state, node state, and configuration are stored here.

Check the ETCD version before starting the backup

 kubectl describe pod etcd-controlplane -n kube-system | grep Image
controlplane $ kubectl describe pod etcd-controlplane -n kube-system | grep Image 
    Image:         k8s.gcr.io/etcd:3.5.3-0
    Image ID:      k8s.gcr.io/etcd@sha256:13f53ed1d91e2e11aac476ee9a0269fdda6cc4874eba903efd40daf50c55eee5
controlplane $

Before taking the backup we can create a test pod for the testing purpose

  • Running pods
controlplane $ kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-54589b89dc-kfbb5   1/1     Running   0          58d
kube-system   canal-6p28v                                2/2     Running   0          58d
kube-system   canal-8hb2n                                2/2     Running   0          58d
kube-system   coredns-7f6d6547b-2rgsz                    1/1     Running   0          58d
kube-system   coredns-7f6d6547b-rjr4g                    1/1     Running   0          58d
kube-system   etcd-controlplane                          1/1     Running   0          58d
kube-system   kube-apiserver-controlplane                1/1     Running   2          58d
kube-system   kube-controller-manager-controlplane       1/1     Running   2          58d
kube-system   kube-proxy-vllmz                           1/1     Running   0          58d
kube-system   kube-proxy-xskb9                           1/1     Running   0          58d
kube-system   kube-scheduler-controlplane                1/1     Running   2          58d
controlplane $

Creating one pod

controlplane $ kubectl run mypod --image=httpd
pod/mypod created
controlplane $ kubectl get pod 
NAME    READY   STATUS    RESTARTS   AGE
mypod   1/1     Running   0          23s
controlplane $

Backup the ETCD

ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379  snapshot.db
controlplane $ ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379  snapshot.db
{"level":"info","ts":1657124335.9830263,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"snapshot.db.part"}
{"level":"info","ts":1657124335.990869,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1657124335.9910252,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1657124336.070651,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1657124336.0928402,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"4.7 MB","took":"now"}
{"level":"info","ts":1657124336.0928817,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"snapshot.db"}
Snapshot saved at snapshot.db
controlplane $

ETCD backup restoration

  • Before restoring backup for the testing purpose, deleting the newly created POD
controlplane $ kubectl get pod 
NAME    READY   STATUS    RESTARTS   AGE
mypod   1/1     Running   0          6m43s
controlplane $ kubectl delete pod mypod
pod "mypod" deleted
controlplane $ kubectl get pod 
No resources found in default namespace.
controlplane $

Restoring the backup

  • We can restore ETCD backup either in the same or different location
    In this article, we restore the backup in the default location. If you are restoring in the default location first you need to remove the default directory "/var/lib/etcd"
rm -rf /var/lib/etcd

Restoring the ETCD backup

ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcd" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" snapshot.db
controlplane $ rm -rf /var/lib/etcd
controlplane $ ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcd" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" snapshot.db
Deprecated: Use `etcdutl snapshot restore` instead.

2022-07-06T16:34:15Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-07-06T16:34:15Z    info    membership/store.go:119 Trimming membership information from the backend...
2022-07-06T16:34:15Z    info    membership/cluster.go:393       added member    {"cluster-id": "c9be114fc2da2776", "local-member-id": "0", "added-peer-id": "a874c87fd42044f", "added-peer-peer-urls": ["https://127.0.0.1:2380"]}
2022-07-06T16:34:15Z    info    snapshot/v3_snapshot.go:272     restored snapshot       {"path": "snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
controlplane $
  • In case you are restoring the backup to a different location there is one more step you need to perform. You need to specify the volume mountpath and host path as well in etcd.yaml configuration file. Usually this file resides under /etc/kubernetes/manifests directory. You can use vi editor to open the file and do the necessary changes as shown below. Once you save and exit the file, you will see that all pods, deployments and services will start getting created.
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.30.1.2:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.30.1.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --initial-advertise-peer-urls=https://172.30.1.2:2380
    - --initial-cluster=controlplane=https://172.30.1.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.30.1.2:2380
    - --name=controlplane
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.5.3-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 25m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd   *** ### This location need to change ***
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd   
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd  
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd     ***### This location need to change*** 
      type: DirectoryOrCreate
    name: etcd-data
status: {}

For the testing purposes, first we can check the deleted pod status

controlplane $ kubectl get pod 
NAME    READY   STATUS    RESTARTS   AGE
mypod   1/1     Running   0          18m
controlplane $

Pod is successfully restored

controlplane $ kubectl get pod  -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-54589b89dc-kfbb5   1/1     Running   0          58d
kube-system   canal-6p28v                                2/2     Running   0          58d
kube-system   canal-8hb2n                                2/2     Running   0          58d
kube-system   coredns-7f6d6547b-2rgsz                    1/1     Running   0          58d
kube-system   coredns-7f6d6547b-rjr4g                    1/1     Running   0          58d
kube-system   etcd-controlplane                          1/1     Running   0          58d
kube-system   kube-apiserver-controlplane                1/1     Running   2          58d
kube-system   kube-controller-manager-controlplane       1/1     Running   2          58d
kube-system   kube-proxy-vllmz                           1/1     Running   0          58d
kube-system   kube-proxy-xskb9                           1/1     Running   0          58d
kube-system   kube-scheduler-controlplane                1/1     Running   2          58d
controlplane $

Hope you have got an idea about Kubernetes ETCD, how to take backup of ETCD and how to restore the ETCD

Happy Learning ๐Ÿ“š

Thank you!

Did you find this article valuable?

Support Cloudnloud Tech Community by becoming a sponsor. Any amount is appreciated!

ย