Table of contents
After reading this post you will be understanding the purpose of ETCD, how to take backup of ETCD and how to restore ETCD
If you have not yet checked the previous parts of this series, please go ahead and check this ๐ Link
Backup and Restore ETCD
What is etcd?
ETCD is a key-value store used as a backing store for all Kubernetes cluster data. When we talk about cluster data we are more specifically talking about the cluster state and the configuration of the cluster, deployments, pod state, node state, and configuration are stored here.
Check the ETCD version before starting the backup
kubectl describe pod etcd-controlplane -n kube-system | grep Image
controlplane $ kubectl describe pod etcd-controlplane -n kube-system | grep Image
Image: k8s.gcr.io/etcd:3.5.3-0
Image ID: k8s.gcr.io/etcd@sha256:13f53ed1d91e2e11aac476ee9a0269fdda6cc4874eba903efd40daf50c55eee5
controlplane $
Before taking the backup we can create a test pod for the testing purpose
- Running pods
controlplane $ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-54589b89dc-kfbb5 1/1 Running 0 58d
kube-system canal-6p28v 2/2 Running 0 58d
kube-system canal-8hb2n 2/2 Running 0 58d
kube-system coredns-7f6d6547b-2rgsz 1/1 Running 0 58d
kube-system coredns-7f6d6547b-rjr4g 1/1 Running 0 58d
kube-system etcd-controlplane 1/1 Running 0 58d
kube-system kube-apiserver-controlplane 1/1 Running 2 58d
kube-system kube-controller-manager-controlplane 1/1 Running 2 58d
kube-system kube-proxy-vllmz 1/1 Running 0 58d
kube-system kube-proxy-xskb9 1/1 Running 0 58d
kube-system kube-scheduler-controlplane 1/1 Running 2 58d
controlplane $
Creating one pod
controlplane $ kubectl run mypod --image=httpd
pod/mypod created
controlplane $ kubectl get pod
NAME READY STATUS RESTARTS AGE
mypod 1/1 Running 0 23s
controlplane $
Backup the ETCD
ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 snapshot.db
controlplane $ ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 snapshot.db
{"level":"info","ts":1657124335.9830263,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"snapshot.db.part"}
{"level":"info","ts":1657124335.990869,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1657124335.9910252,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1657124336.070651,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1657124336.0928402,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"4.7 MB","took":"now"}
{"level":"info","ts":1657124336.0928817,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"snapshot.db"}
Snapshot saved at snapshot.db
controlplane $
ETCD backup restoration
- Before restoring backup for the testing purpose, deleting the newly created POD
controlplane $ kubectl get pod
NAME READY STATUS RESTARTS AGE
mypod 1/1 Running 0 6m43s
controlplane $ kubectl delete pod mypod
pod "mypod" deleted
controlplane $ kubectl get pod
No resources found in default namespace.
controlplane $
Restoring the backup
- We can restore ETCD backup either in the same or different location
In this article, we restore the backup in the default location. If you are restoring in the default location first you need to remove the default directory "/var/lib/etcd"
rm -rf /var/lib/etcd
Restoring the ETCD backup
ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcd" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" snapshot.db
controlplane $ rm -rf /var/lib/etcd
controlplane $ ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcd" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" snapshot.db
Deprecated: Use `etcdutl snapshot restore` instead.
2022-07-06T16:34:15Z info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-07-06T16:34:15Z info membership/store.go:119 Trimming membership information from the backend...
2022-07-06T16:34:15Z info membership/cluster.go:393 added member {"cluster-id": "c9be114fc2da2776", "local-member-id": "0", "added-peer-id": "a874c87fd42044f", "added-peer-peer-urls": ["https://127.0.0.1:2380"]}
2022-07-06T16:34:15Z info snapshot/v3_snapshot.go:272 restored snapshot {"path": "snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
controlplane $
- In case you are restoring the backup to a different location there is one more step you need to perform. You need to specify the volume mountpath and host path as well in etcd.yaml configuration file. Usually this file resides under /etc/kubernetes/manifests directory. You can use vi editor to open the file and do the necessary changes as shown below. Once you save and exit the file, you will see that all pods, deployments and services will start getting created.
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.30.1.2:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://172.30.1.2:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --initial-advertise-peer-urls=https://172.30.1.2:2380
- --initial-cluster=controlplane=https://172.30.1.2:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://172.30.1.2:2380
- --name=controlplane
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.5.3-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 25m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd *** ### This location need to change ***
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd ***### This location need to change***
type: DirectoryOrCreate
name: etcd-data
status: {}
For the testing purposes, first we can check the deleted pod status
controlplane $ kubectl get pod
NAME READY STATUS RESTARTS AGE
mypod 1/1 Running 0 18m
controlplane $
Pod is successfully restored
controlplane $ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-54589b89dc-kfbb5 1/1 Running 0 58d
kube-system canal-6p28v 2/2 Running 0 58d
kube-system canal-8hb2n 2/2 Running 0 58d
kube-system coredns-7f6d6547b-2rgsz 1/1 Running 0 58d
kube-system coredns-7f6d6547b-rjr4g 1/1 Running 0 58d
kube-system etcd-controlplane 1/1 Running 0 58d
kube-system kube-apiserver-controlplane 1/1 Running 2 58d
kube-system kube-controller-manager-controlplane 1/1 Running 2 58d
kube-system kube-proxy-vllmz 1/1 Running 0 58d
kube-system kube-proxy-xskb9 1/1 Running 0 58d
kube-system kube-scheduler-controlplane 1/1 Running 2 58d
controlplane $
Hope you have got an idea about Kubernetes ETCD, how to take backup of ETCD and how to restore the ETCD
Happy Learning ๐
Thank you!
ย