案发过程:本想给node01节点上的标签删除掉的,结果执行错了命令,导致node01节点消失!

1
2
3
4
5
6
7
[root@master test-yaml]# kubectl delete node node01 disk=ssd 
node "node01" deleted
Error from server (NotFound): nodes "disk=ssd" not found
[root@master test-yaml]# kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready control-plane 13d v1.30.0
node02 Ready <none> 13d v1.30.0

恢复过程

1. node01节点清理配置数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 1.先停止相关服务
[root@node01 ~]# systemctl stop kubelet
[root@node01 ~]# systemctl stop docker
[root@node01 ~]# systemctl stop cri-docker

# 2.删除相关旧配置文件
[root@node01 ~]# rm -rf /var/lib/cni/
[root@node01 ~]# rm -rf /var/lib/kubelet/
[root@node01 ~]# rm -rf /etc/cni/
[root@node01 ~]# rm -rf /etc/kubernetes/

# 3.重新启动相关服务
[root@node01 ~]# systemctl start kubelet
[root@node01 ~]# systemctl start docker
[root@node01 ~]# systemctl start cri-docker

2. master节点重新生成join token

1
2
3
4
5
6
7
[root@master ~]# kubeadm token create --print-join-command
kubeadm join 192.168.0.11:6443 --token 4dx9gu.sb95v5mqq3an77ns --discovery-token-ca-cert-hash sha256:1ff346f4ddd8de598cc6998148d2856b5c5aff4c5ba401796eb772b2c936057

[root@master test-yaml]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
4dx9gu.sb95v5mqq3an77ns 23h 2024-07-26T06:51:02Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token

3.在node01节点执行join命令

由于k8s版本1.30较高,需加上--cri-socket=unix:///var/run/cri-dockerd.sock命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@node01 ~]# kubeadm join 192.168.0.11:6443 --token 30vo5d.q0jmlzkvzorx8drq --discovery-token-ca-cert-hash sha256:1ff346f4ddd8de598cc6998148d2856b5c5aff4c5ba401796eb772b2c9360571 --cri-socket=unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.078678ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

[root@node01 ~]#

4.验证集群状态

1
2
3
4
5
6
7
8
9
10
11
[root@master ~]# kubectl get node -A
NAME STATUS ROLES AGE VERSION
master Ready control-plane 13d v1.30.0
node01 NotReady <none> 25s v1.30.0
node02 Ready <none> 13d v1.30.0
[root@master ~]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 13d v1.30.0 192.168.0.11 <none> CentOS Linux 7 (Core) 3.10.0-1160.119.1.el7.x86_64 docker://26.1.4
node01 Ready <none> 31s v1.30.0 192.168.0.12 <none> CentOS Linux 7 (Core) 3.10.0-1160.119.1.el7.x86_64 docker://26.1.4
node02 Ready <none> 13d v1.30.0 192.168.0.13 <none> CentOS Linux 7 (Core) 3.10.0-1160.119.1.el7.x86_64 docker://26.1.4