etcd 配置 initial-cluster-state
1. 概述
这是一个序列总结文档。
- 第1节 安装etcd 中,在CentOS7上面安装etcd。
- 第2节 单节点运行etcd 在单节点上面运行etcd。
- 第3节 三节点部署etcd集群 在三节点上面部署etcd集群,并为etcd配置了一些快捷命令。
- 第4节 etcd TLS集群部署 在三节点上面部署etcd集群,并开启TLS协议的加密通讯。
- 第5节 etcd角色权限控制 在三节点上面部署etcd TLS集群的基础上,开启角色控制。详细可参考 https://etcd.io/docs/v3.5/demo/ 。
- 第6节 etcd证书过期处理,在etcd证书过期后,etcd相关命令行都操作不了,讲解如何处理这个问题。
- 第7节 etcd配置文件, 通过etcd配置文件来配置相关参数,然后启动etcd服务。
- 第8节 etcd可视化工具, 介绍一款好用的etcd可视化工具etcd-workbench。
- 第9节 etcd的关键配置initial-cluster-state, 介绍
initial-cluster-state
设置成new
和existing
的区别。
1.1 VirtualBox虚拟机信息记录
学习etcd时,使用以下几个虚拟机:
序号 | 虚拟机 | 主机名 | IP | CPU | 内存 | 说明 |
---|---|---|---|---|---|---|
1 | ansible-master | ansible | 192.168.56.120 | 2核 | 4G | Ansible控制节点 |
2 | ansible-node1 | etcd-node1 | 192.168.56.121 | 2核 | 2G | Ansible工作节点1 |
3 | ansible-node2 | etcd-node2 | 192.168.56.122 | 2核 | 2G | Ansible工作节点2 |
4 | ansible-node3 | etcd-node3 | 192.168.56.123 | 2核 | 2G | Ansible工作节点3 |
后面会编写使用ansible部署etcd集群的剧本。
操作系统说明:
[root@etcd-node1 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
[root@etcd-node1 ~]# hostname -I
192.168.56.121 10.0.3.15
[root@etcd-node1 ~]#
1.2 配置说明
参考第7节 etcd配置文件, 可以看到etcd配置文件配置的initial-cluster-state
是new
,表示新建集群,如下所示:
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'
如果后续重启etcd服务,应将这个配置修改成initial-cluster-state: 'existing'
表示加入一个已经存在的。初始化过的集群中,此时集群ID不会发生变化!
为了验证这个设置,我做以下实验:
- 备份各节点的
/srv/etcd/node
目录,以备测试完成后还原。 - 使用
initial-cluster-state: 'existing'
配置启动etcd服务,观察集群ID和节点ID变化,以及日志信息。 - 使用
initial-cluster-state: 'new'
配置启动etcd服务,观察集群ID和节点ID变化,以及日志信息。 - 测试完成后,使用备份文件还原并启动etcd服务。
1.3 回顾历史
之前参考第7节 etcd配置文件, 通过etcd配置文件来配置相关参数,然后启动etcd服务。
在三个节点上面使用start_by_config.sh
启动etcd服务。
[root@etcd-node1 ~]# cd /srv/etcd/node
[root@etcd-node1 node]# ls
config logs openssl.conf start_by_config.sh start.sh
data.etcd nohup.out start_auto_ssl.sh start_no_ssl.sh stop.sh
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
[root@etcd-node1 node]#
启动后,查看etcd集群状态:
[root@etcd-node1 ~]# ech
+-----------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.121:2379 | true | 1.226138ms | |
| https://192.168.56.123:2379 | true | 1.153493ms | |
| https://192.168.56.122:2379 | true | 753.212µs | |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 ~]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 | false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 | false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 ~]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b | 3.5.18 | 25 kB | false | false | 23 | 974 | 974 | |
| https://192.168.56.122:2379 | d553b4da699c7263 | 3.5.18 | 25 kB | true | false | 23 | 975 | 975 | |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b | 3.5.18 | 25 kB | false | false | 23 | 976 | 976 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 ~]#
可以知道三个节点的ID情况:
- 节点1, e14cb1abc9daea5b
- 节点2,d553b4da699c7263
- 节点3,a7d7b09bf04ad21b
以下就开始进行测试。
2. 配置项测试
2.1 备份文件
三个节点,切换到/srv/etcd
目录下,然后执行cp -rp node node.bak
备份目录:
[root@etcd-node1 ~]# cd /srv/etcd
[root@etcd-node1 etcd]# cp -rp node node.bak
[root@etcd-node1 etcd]# ls -lah node node.bak
node:
total 26M
drwxr-xr-x 5 root root 191 May 7 22:25 .
drwxr-xr-x 7 root root 73 Jun 3 22:36 ..
drwxr-xr-x 2 root root 23 Jun 2 21:40 config
drwx------ 3 root root 20 Jun 2 22:05 data.etcd
drwxr-xr-x 2 root root 39 May 7 22:26 logs
-rw------- 1 root root 26M May 2 11:57 nohup.out
-rw-r--r-- 1 root root 0 Apr 5 22:58 openssl.conf
-rwxr--r-- 1 root root 1007 Apr 5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root 105 May 7 22:33 start_by_config.sh
-rwxr--r-- 1 root root 954 Mar 2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root 1.6K Apr 5 23:59 start.sh
-rwxr--r-- 1 root root 61 Apr 5 23:20 stop.sh
node.bak:
total 26M
drwxr-xr-x 5 root root 191 May 7 22:25 .
drwxr-xr-x 7 root root 73 Jun 3 22:36 ..
drwxr-xr-x 2 root root 23 Jun 2 21:40 config
drwx------ 3 root root 20 Jun 2 22:05 data.etcd
drwxr-xr-x 2 root root 39 May 7 22:26 logs
-rw------- 1 root root 26M May 2 11:57 nohup.out
-rw-r--r-- 1 root root 0 Apr 5 22:58 openssl.conf
-rwxr--r-- 1 root root 1007 Apr 5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root 105 May 7 22:33 start_by_config.sh
-rwxr--r-- 1 root root 954 Mar 2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root 1.6K Apr 5 23:59 start.sh
-rwxr--r-- 1 root root 61 Apr 5 23:20 stop.sh
[root@etcd-node1 etcd]#
2.2 以initial-cluster-state: 'existing'
配置启动
查看当前三个节点的initial-cluster-state:
配置。
# 节点1查看配置情况
[root@etcd-node1 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node1 ~]#
# 节点2查看配置情况
[root@etcd-node2 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node2 ~]#
# 节点3查看配置情况
[root@etcd-node3 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node3 ~]#
可以看到,当前三个节点都配置的是initial-cluster-state: 'existing'
,即加入已有集群!!
此时,启动一下三个节点的服务:
[root@etcd-node1 ~]# cd /srv/etcd/node && ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
[root@etcd-node1 node]#
三个节点都启动了!
[root@etcd-node1 node]# ech
+-----------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 | true | 1.322713ms | |
| https://192.168.56.122:2379 | true | 2.835666ms | |
| https://192.168.56.121:2379 | true | 1.587397ms | |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 | false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 | false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b | 3.5.18 | 25 kB | false | false | 30 | 1179 | 1179 | |
| https://192.168.56.122:2379 | d553b4da699c7263 | 3.5.18 | 25 kB | false | false | 30 | 1180 | 1180 | |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b | 3.5.18 | 25 kB | true | false | 30 | 1181 | 1181 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# date
Tue Jun 3 22:44:15 CST 2025
# 查看十进制的集群ID和成员ID等信息
[root@etcd-node1 node]# rootetcdctl --write-out=fields member lis
"ClusterID" : 11928626832149063955
"MemberID" : 15371828803313365603
"Revision" : 0
"RaftTerm" : 30
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false
"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false
"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false
[root@etcd-node1 node]#
此时,通过ecm
和ecs
都可以看到,三个节点的十六进制ID是:
- 节点1 ID是 e14cb1abc9daea5b
- 节点2 ID是 d553b4da699c7263
- 节点3 ID是 a7d7b09bf04ad21b
即与以前启动时显示的节点ID是一致的,说明节点ID没有发生变化!
同时,可以对比5月31日的截图,可以看到之前的十进制集群ID和成员ID信息:
- 集群ID是 11928626832149063955
- 节点1 ID是 16234546108147886683
- 节点2 ID是 15371828803313365603
- 节点3 ID是 12094329508124611099
可以看到,集群ID和成员ID保持不变,仍然是以前的ID值。
这是我期望的状态,说明当配置initial-cluster-state: 'existing'
时,etcd集群节点ID和成员ID不会发生变化。
此时,使用./stop.sh
脚本,将三个节点的etcd服务停掉!
[root@etcd-node1 node]# ./stop.sh
2.3 以initial-cluster-state: 'new'
配置启动
查看当前三个节点的initial-cluster-state:
配置。
[root@etcd-node1 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node1 ~]#
2.3.1 仅修改initial-cluster-state值为new
修改三个节点配置:
sed -i "s/initial-cluster-state: 'existing'/initial-cluster-state: 'new'/g" /srv/etcd/node/config/etcd.yaml
执行以下命令后,再次查看配置情况:
[root@etcd-node1 node]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'
[root@etcd-node1 node]#
可以看到配置已经改变!
此时启动三个节点的etcd服务:
[root@etcd-node1 node]# cd /srv/etcd/node && ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
[root@etcd-node1 node]#
此时查看集群ID和成员ID信息:
[root@etcd-node1 node]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 31
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false
"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false
"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false
[root@etcd-node1 node]#
可以看到与上一节获取到的十进制集群ID和成员ID信息是一样的:
- 集群ID是 11928626832149063955
- 节点1 ID是 16234546108147886683
- 节点2 ID是 15371828803313365603
- 节点3 ID是 12094329508124611099
此时,为什么没有变化!!!
根本原因:数据目录的优先级高于启动参数
etcd 在启动时遵循一个核心原则:
若数据目录(
--data-dir
)已存在且包含有效集群状态(如member/snap/db
文件),则忽略initial-cluster-state
的配置,直接加载本地数据恢复集群。启动流程解析:
- 检查数据目录 etcd 启动时首先检查
--data-dir
目录:
- 若目录 不存在 或 为空 → 进入初始化流程,遵循
initial-cluster-state=new
的配置。- 若目录 存在且包含有效数据(如
member/snap/db
)→ 跳过初始化流程,直接加载持久化数据。- 参数
initial-cluster-state
的作用范围 该参数 仅在初始化新集群时生效。若检测到已有数据,etcd 会:
- 自动切换为
existing
模式(无论配置如何)。- 从磁盘加载 集群 ID、成员 ID、Raft 日志、快照 等状态。
即当存在数据目录相关文件时,etcd会忽略 initial-cluster-state
的配置。
[root@etcd-node1 node]# find data.etcd/
data.etcd/
data.etcd/member
data.etcd/member/snap
data.etcd/member/snap/db
data.etcd/member/wal
data.etcd/member/wal/0000000000000000-0000000000000000.wal
data.etcd/member/wal/0.tmp
[root@etcd-node1 node]#
可以看到,我们的确存在了相关的集群状态文件。
2.3.2 删除数据目录data.etcd
先停止三个节点的etcd服务:
[root@etcd-node1 node]# ./stop.sh
三个节点都执行stop.sh
脚本。
为了验证数据目录不存在时,使用initial-cluster-state=new
时会重新创建集群ID和成员ID信息,将三个节点的数据目录下的文件删除掉(注意,你在删除前应像我在2.1节那样,提前做好备份):
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]# rm -rf data.etcd/*
[root@etcd-node1 node]# ll data.etcd
total 0
[root@etcd-node1 node]#
再启动三个节点的etcd服务:
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
[root@etcd-node1 node]#
然后查看集群相关信息:
[root@etcd-node1 node]# ech
{"level":"warn","ts":"2025-06-03T23:28:00.088311+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001181e0/192.168.56.123:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:00.089092+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:00.089544+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001183c0/192.168.56.122:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 | true | 907.872µs | |
| https://192.168.56.121:2379 | true | 680.706µs | |
| https://192.168.56.122:2379 | true | 2.587289ms | |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
{"level":"warn","ts":"2025-06-03T23:28:01.578390+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 | false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 | false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
{"level":"warn","ts":"2025-06-03T23:28:03.338715+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.342514+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.344923+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.350804+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b | 3.5.18 | 20 kB | false | false | 2 | 14 | 14 | |
| https://192.168.56.122:2379 | d553b4da699c7263 | 3.5.18 | 20 kB | false | false | 2 | 14 | 14 | |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b | 3.5.18 | 20 kB | true | false | 2 | 14 | 14 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
{"level":"warn","ts":"2025-06-03T23:28:05.875196+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00011a1e0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 2
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false
"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false
"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false
[root@etcd-node1 node]#
此时,不用理会authentication is not enabled
这些告警信息。只关心最后的集群ID和成员ID信息:
- 集群ID是 11928626832149063955
- 节点1 ID是 16234546108147886683
- 节点2 ID是 15371828803313365603
- 节点3 ID是 12094329508124611099
此时可以看到,集群ID和成员ID信息还是保持之前一样的!!!
您的观察揭示了 etcd 中一个关键但常被忽视的行为。即使删除了所有节点的
data.etcd
目录,集群 ID 和成员 ID 仍然保持不变,这确实可能发生。以下是根本原因和解决方案:根本原因:集群配置参数的持久性
etcd 的集群身份不完全依赖磁盘存储,而是由 启动参数决定,特别是:
--initial-cluster-token
这是决定集群 ID 的核心参数。如果您没有显式修改它,etcd 会使用默认值或之前的值。--initial-cluster
配置 成员 ID 是由节点名称 (--name
) 和 peer URL 的组合通过算法生成的哈希值
2.3.3 修改initial-cluster-token令牌值
停掉各节点服务:
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]#
查看当前令牌配置:
[root@etcd-node1 node]# grep token config/etcd.yaml
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'token-01'
[root@etcd-node1 node]#
修改令牌令牌:
[root@etcd-node1 node]# sed -i "s/initial-cluster-token: 'token-01'/initial-cluster-token: 'token-test'/g" /srv/etcd/node/config/etcd.yaml
[root@etcd-node1 node]# grep token config/etcd.yaml
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'token-test'
[root@etcd-node1 node]#
[root@etcd-node1 node]# rm -rf data.etcd
注意,此处同样要删除数据目录!!!
此时再启动三个节点服务:
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
[root@etcd-node1 node]#
此查看集群ID和成员ID令牌:
[root@etcd-node1 ~]# ech
{"level":"warn","ts":"2025-06-04T00:04:53.663621+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.123:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:53.664028+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000456000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:53.665243+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000345a0/192.168.56.122:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 | true | 1.219491ms | |
| https://192.168.56.121:2379 | true | 1.082714ms | |
| https://192.168.56.122:2379 | true | 2.658924ms | |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 ~]# ecm
{"level":"warn","ts":"2025-06-04T00:04:55.186876+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000341e0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 6adc56df00ffcfe0 | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 | false |
| ceba196a99b5f14e | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 | false |
| f737d9215ef36e4c | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 ~]# ecs
{"level":"warn","ts":"2025-06-04T00:04:57.043711+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.046273+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.049262+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.054286+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | f737d9215ef36e4c | 3.5.18 | 20 kB | false | false | 2 | 14 | 14 | |
| https://192.168.56.122:2379 | ceba196a99b5f14e | 3.5.18 | 20 kB | false | false | 2 | 14 | 14 | |
| https://192.168.56.123:2379 | 6adc56df00ffcfe0 | 3.5.18 | 20 kB | true | false | 2 | 14 | 14 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 ~]# rootetcdctl --write-out=fields member list
{"level":"warn","ts":"2025-06-04T00:05:01.131528+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
"ClusterID" : 16484408410993616614
"MemberID" : 14896246663117402446
"Revision" : 0
"RaftTerm" : 2
"ID" : 7700124978691166176
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false
"ID" : 14896246663117402446
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false
"ID" : 17813945588437446220
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false
[root@etcd-node1 ~]#
可以看到,此时集群id和成员发生了变化!
2.4 还原配置
在以上测试完成后,停止etcd服务,并删除测试使用的/srv/etcd/node
目录,并将备份的目录/srv/etcd/node.bak
复制为/srv/etcd/node
,然后再启动etcd服务。
[root@etcd-node1 ~]# cd /srv/etcd/node
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]# ps -ef|grep -v grep|grep etcd
[root@etcd-node1 node]# cd ..
[root@etcd-node1 etcd]# rm -rf node
[root@etcd-node1 etcd]# ls -d node
ls: cannot access node: No such file or directory
[root@etcd-node1 etcd]# cp -rp node.bak node
[root@etcd-node1 etcd]# ls -la node
total 26108
drwxr-xr-x 5 root root 191 May 7 22:25 .
drwxr-xr-x 7 root root 73 Jun 4 22:07 ..
drwxr-xr-x 2 root root 23 Jun 2 21:40 config
drwx------ 3 root root 20 Jun 2 22:05 data.etcd
drwxr-xr-x 2 root root 39 May 7 22:26 logs
-rw------- 1 root root 26713408 May 2 11:57 nohup.out
-rw-r--r-- 1 root root 0 Apr 5 22:58 openssl.conf
-rwxr--r-- 1 root root 1007 Apr 5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root 105 May 7 22:33 start_by_config.sh
-rwxr--r-- 1 root root 954 Mar 2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root 1548 Apr 5 23:59 start.sh
-rwxr--r-- 1 root root 61 Apr 5 23:20 stop.sh
[root@etcd-node1 etcd]# cd node
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’
此时再检查一下etcd相关的命令,以及查看集群ID和成员ID等信息:
[root@etcd-node1 node]# ech
+-----------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.121:2379 | true | 1.562419ms | |
| https://192.168.56.122:2379 | true | 1.022737ms | |
| https://192.168.56.123:2379 | true | 2.509065ms | |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 | false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 | false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 | false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b | 3.5.18 | 25 kB | false | false | 30 | 1212 | 1212 | |
| https://192.168.56.122:2379 | d553b4da699c7263 | 3.5.18 | 25 kB | true | false | 30 | 1213 | 1213 | |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b | 3.5.18 | 25 kB | false | false | 30 | 1214 | 1214 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 30
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false
"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false
"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false
[root@etcd-node1 node]# date
Wed Jun 4 22:08:38 CST 2025
[root@etcd-node1 node]#
可以看到,etcd恢复正常,集群ID和成员ID也与测试前的一致!!此时使用etcd-workbench登陆查看集群信息也是恢复正常的!
3. etcd启动顺序
根据前面第2节的实验,绘制了一个流程图,在什么情况下集群ID和成员ID会保持不变,或者重新生成新的集群ID和成员ID信息。