Skip to content

etcd 配置 initial-cluster-state

1. 概述

这是一个序列总结文档。

1.1 VirtualBox虚拟机信息记录

学习etcd时,使用以下几个虚拟机:

序号虚拟机主机名IPCPU内存说明
1ansible-masteransible192.168.56.1202核4GAnsible控制节点
2ansible-node1etcd-node1192.168.56.1212核2GAnsible工作节点1
3ansible-node2etcd-node2192.168.56.1222核2GAnsible工作节点2
4ansible-node3etcd-node3192.168.56.1232核2GAnsible工作节点3

后面会编写使用ansible部署etcd集群的剧本。

操作系统说明:

sh
[root@etcd-node1 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
[root@etcd-node1 ~]# hostname -I
192.168.56.121 10.0.3.15
[root@etcd-node1 ~]#

1.2 配置说明

参考第7节 etcd配置文件, 可以看到etcd配置文件配置的initial-cluster-statenew,表示新建集群,如下所示:

yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'

如果后续重启etcd服务,应将这个配置修改成initial-cluster-state: 'existing'表示加入一个已经存在的。初始化过的集群中,此时集群ID不会发生变化!

为了验证这个设置,我做以下实验:

  • 备份各节点的/srv/etcd/node目录,以备测试完成后还原。
  • 使用initial-cluster-state: 'existing'配置启动etcd服务,观察集群ID和节点ID变化,以及日志信息。
  • 使用initial-cluster-state: 'new'配置启动etcd服务,观察集群ID和节点ID变化,以及日志信息。
  • 测试完成后,使用备份文件还原并启动etcd服务。

1.3 回顾历史

之前参考第7节 etcd配置文件, 通过etcd配置文件来配置相关参数,然后启动etcd服务。

在三个节点上面使用start_by_config.sh启动etcd服务。

sh
[root@etcd-node1 ~]# cd /srv/etcd/node
[root@etcd-node1 node]# ls
config     logs       openssl.conf       start_by_config.sh  start.sh
data.etcd  nohup.out  start_auto_ssl.sh  start_no_ssl.sh     stop.sh
[root@etcd-node1 node]# ./start_by_config.sh 
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

[root@etcd-node1 node]#

启动后,查看etcd集群状态:

sh
[root@etcd-node1 ~]# ech
+-----------------------------+--------+------------+-------+
|          ENDPOINT           | HEALTH |    TOOK    | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.121:2379 |   true | 1.226138ms |       |
| https://192.168.56.123:2379 |   true | 1.153493ms |       |
| https://192.168.56.122:2379 |   true |  753.212µs |       |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 ~]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 |      false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 |      false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 ~]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b |  3.5.18 |   25 kB |     false |      false |        23 |        974 |                974 |        |
| https://192.168.56.122:2379 | d553b4da699c7263 |  3.5.18 |   25 kB |      true |      false |        23 |        975 |                975 |        |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b |  3.5.18 |   25 kB |     false |      false |        23 |        976 |                976 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 ~]#

可以知道三个节点的ID情况:

  • 节点1, e14cb1abc9daea5b
  • 节点2,d553b4da699c7263
  • 节点3,a7d7b09bf04ad21b

以下就开始进行测试。

2. 配置项测试

2.1 备份文件

三个节点,切换到/srv/etcd目录下,然后执行cp -rp node node.bak备份目录:

sh
[root@etcd-node1 ~]# cd /srv/etcd
[root@etcd-node1 etcd]# cp -rp node node.bak
[root@etcd-node1 etcd]# ls -lah node node.bak
node:
total 26M
drwxr-xr-x 5 root root  191 May  7 22:25 .
drwxr-xr-x 7 root root   73 Jun  3 22:36 ..
drwxr-xr-x 2 root root   23 Jun  2 21:40 config
drwx------ 3 root root   20 Jun  2 22:05 data.etcd
drwxr-xr-x 2 root root   39 May  7 22:26 logs
-rw------- 1 root root  26M May  2 11:57 nohup.out
-rw-r--r-- 1 root root    0 Apr  5 22:58 openssl.conf
-rwxr--r-- 1 root root 1007 Apr  5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root  105 May  7 22:33 start_by_config.sh
-rwxr--r-- 1 root root  954 Mar  2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root 1.6K Apr  5 23:59 start.sh
-rwxr--r-- 1 root root   61 Apr  5 23:20 stop.sh

node.bak:
total 26M
drwxr-xr-x 5 root root  191 May  7 22:25 .
drwxr-xr-x 7 root root   73 Jun  3 22:36 ..
drwxr-xr-x 2 root root   23 Jun  2 21:40 config
drwx------ 3 root root   20 Jun  2 22:05 data.etcd
drwxr-xr-x 2 root root   39 May  7 22:26 logs
-rw------- 1 root root  26M May  2 11:57 nohup.out
-rw-r--r-- 1 root root    0 Apr  5 22:58 openssl.conf
-rwxr--r-- 1 root root 1007 Apr  5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root  105 May  7 22:33 start_by_config.sh
-rwxr--r-- 1 root root  954 Mar  2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root 1.6K Apr  5 23:59 start.sh
-rwxr--r-- 1 root root   61 Apr  5 23:20 stop.sh
[root@etcd-node1 etcd]#

Snipaste_2025-06-03_22-37-46.png

2.2 以initial-cluster-state: 'existing'配置启动

查看当前三个节点的initial-cluster-state: 配置。

sh
# 节点1查看配置情况
[root@etcd-node1 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node1 ~]# 


# 节点2查看配置情况
[root@etcd-node2 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node2 ~]# 


# 节点3查看配置情况
[root@etcd-node3 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node3 ~]#

可以看到,当前三个节点都配置的是initial-cluster-state: 'existing',即加入已有集群!!

此时,启动一下三个节点的服务:

sh
[root@etcd-node1 ~]# cd /srv/etcd/node && ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

[root@etcd-node1 node]#

Snipaste_2025-06-03_22-43-18.png

三个节点都启动了!

sh
[root@etcd-node1 node]# ech
+-----------------------------+--------+------------+-------+
|          ENDPOINT           | HEALTH |    TOOK    | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 |   true | 1.322713ms |       |
| https://192.168.56.122:2379 |   true | 2.835666ms |       |
| https://192.168.56.121:2379 |   true | 1.587397ms |       |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 |      false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 |      false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b |  3.5.18 |   25 kB |     false |      false |        30 |       1179 |               1179 |        |
| https://192.168.56.122:2379 | d553b4da699c7263 |  3.5.18 |   25 kB |     false |      false |        30 |       1180 |               1180 |        |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b |  3.5.18 |   25 kB |      true |      false |        30 |       1181 |               1181 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# date
Tue Jun  3 22:44:15 CST 2025

# 查看十进制的集群ID和成员ID等信息
[root@etcd-node1 node]# rootetcdctl --write-out=fields member lis
"ClusterID" : 11928626832149063955
"MemberID" : 15371828803313365603
"Revision" : 0
"RaftTerm" : 30
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false

"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false

"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false

[root@etcd-node1 node]#

此时,通过ecmecs都可以看到,三个节点的十六进制ID是:

  • 节点1 ID是 e14cb1abc9daea5b
  • 节点2 ID是 d553b4da699c7263
  • 节点3 ID是 a7d7b09bf04ad21b

即与以前启动时显示的节点ID是一致的,说明节点ID没有发生变化!

同时,可以对比5月31日的截图,可以看到之前的十进制集群ID和成员ID信息:

  • 集群ID是 11928626832149063955
  • 节点1 ID是 16234546108147886683
  • 节点2 ID是 15371828803313365603
  • 节点3 ID是 12094329508124611099

Snipaste_2025-05-31_22-34-39.png

可以看到,集群ID和成员ID保持不变,仍然是以前的ID值。

这是我期望的状态,说明当配置initial-cluster-state: 'existing'时,etcd集群节点ID和成员ID不会发生变化。

此时,使用./stop.sh脚本,将三个节点的etcd服务停掉!

sh
[root@etcd-node1 node]# ./stop.sh

2.3 以initial-cluster-state: 'new'配置启动

查看当前三个节点的initial-cluster-state: 配置。

sh
[root@etcd-node1 ~]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'existing'
[root@etcd-node1 ~]#

2.3.1 仅修改initial-cluster-state值为new

修改三个节点配置:

sh
sed -i "s/initial-cluster-state: 'existing'/initial-cluster-state: 'new'/g" /srv/etcd/node/config/etcd.yaml

执行以下命令后,再次查看配置情况:

sh
[root@etcd-node1 node]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'
[root@etcd-node1 node]#

Snipaste_2025-06-03_23-02-36.png

可以看到配置已经改变!

此时启动三个节点的etcd服务:

sh
[root@etcd-node1 node]# cd /srv/etcd/node && ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

[root@etcd-node1 node]#

此时查看集群ID和成员ID信息:

sh
[root@etcd-node1 node]# grep -B2 initial-cluster-state /srv/etcd/node/config/etcd.yaml
# Initial cluster state ('new' or 'existing').
# 集群初始状态,new 表示新建集群,existing 表示加入已有集群
initial-cluster-state: 'new'
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 31
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false

"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false

"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false

[root@etcd-node1 node]#

可以看到与上一节获取到的十进制集群ID和成员ID信息是一样的:

  • 集群ID是 11928626832149063955
  • 节点1 ID是 16234546108147886683
  • 节点2 ID是 15371828803313365603
  • 节点3 ID是 12094329508124611099

此时,为什么没有变化!!!

根本原因:数据目录的优先级高于启动参数

etcd 在启动时遵循一个核心原则

若数据目录(--data-dir)已存在且包含有效集群状态(如 member/snap/db 文件),则忽略 initial-cluster-state 的配置,直接加载本地数据恢复集群。

启动流程解析:

  1. 检查数据目录 etcd 启动时首先检查 --data-dir 目录:
    • 若目录 不存在为空 → 进入初始化流程,遵循 initial-cluster-state=new 的配置。
    • 若目录 存在且包含有效数据(如 member/snap/db)→ 跳过初始化流程,直接加载持久化数据。
  2. 参数 initial-cluster-state 的作用范围 该参数 仅在初始化新集群时生效。若检测到已有数据,etcd 会:
    • 自动切换为 existing 模式(无论配置如何)。
    • 从磁盘加载 集群 ID、成员 ID、Raft 日志、快照 等状态。

即当存在数据目录相关文件时,etcd会忽略 initial-cluster-state 的配置。

sh
[root@etcd-node1 node]# find data.etcd/
data.etcd/
data.etcd/member
data.etcd/member/snap
data.etcd/member/snap/db
data.etcd/member/wal
data.etcd/member/wal/0000000000000000-0000000000000000.wal
data.etcd/member/wal/0.tmp
[root@etcd-node1 node]#

可以看到,我们的确存在了相关的集群状态文件。

2.3.2 删除数据目录data.etcd

先停止三个节点的etcd服务:

sh
[root@etcd-node1 node]# ./stop.sh

三个节点都执行stop.sh脚本。

为了验证数据目录不存在时,使用initial-cluster-state=new 时会重新创建集群ID和成员ID信息,将三个节点的数据目录下的文件删除掉(注意,你在删除前应像我在2.1节那样,提前做好备份):

sh
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]# rm -rf data.etcd/*
[root@etcd-node1 node]# ll data.etcd
total 0
[root@etcd-node1 node]#

再启动三个节点的etcd服务:

sh
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

[root@etcd-node1 node]#

然后查看集群相关信息:

sh

[root@etcd-node1 node]# ech
{"level":"warn","ts":"2025-06-03T23:28:00.088311+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001181e0/192.168.56.123:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:00.089092+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:00.089544+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001183c0/192.168.56.122:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+--------+------------+-------+
|          ENDPOINT           | HEALTH |    TOOK    | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 |   true |  907.872µs |       |
| https://192.168.56.121:2379 |   true |  680.706µs |       |
| https://192.168.56.122:2379 |   true | 2.587289ms |       |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
{"level":"warn","ts":"2025-06-03T23:28:01.578390+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 |      false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 |      false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
{"level":"warn","ts":"2025-06-03T23:28:03.338715+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.342514+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.344923+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-03T23:28:03.350804+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b |  3.5.18 |   20 kB |     false |      false |         2 |         14 |                 14 |        |
| https://192.168.56.122:2379 | d553b4da699c7263 |  3.5.18 |   20 kB |     false |      false |         2 |         14 |                 14 |        |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b |  3.5.18 |   20 kB |      true |      false |         2 |         14 |                 14 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
{"level":"warn","ts":"2025-06-03T23:28:05.875196+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00011a1e0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 2
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false

"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false

"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false

[root@etcd-node1 node]#

此时,不用理会authentication is not enabled这些告警信息。只关心最后的集群ID和成员ID信息:

  • 集群ID是 11928626832149063955
  • 节点1 ID是 16234546108147886683
  • 节点2 ID是 15371828803313365603
  • 节点3 ID是 12094329508124611099

此时可以看到,集群ID和成员ID信息还是保持之前一样的!!!

您的观察揭示了 etcd 中一个关键但常被忽视的行为。即使删除了所有节点的 data.etcd 目录,集群 ID 和成员 ID 仍然保持不变,这确实可能发生。以下是根本原因和解决方案:

根本原因:集群配置参数的持久性

etcd 的集群身份不完全依赖磁盘存储,而是由 启动参数决定,特别是:

  1. --initial-cluster-token 这是决定集群 ID 的核心参数。如果您没有显式修改它,etcd 会使用默认值或之前的值。
  2. --initial-cluster 配置 成员 ID 是由节点名称 (--name) 和 peer URL 的组合通过算法生成的哈希值

2.3.3 修改initial-cluster-token令牌值

停掉各节点服务:

sh
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]#

查看当前令牌配置:

sh
[root@etcd-node1 node]# grep token config/etcd.yaml 
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'token-01'
[root@etcd-node1 node]#

修改令牌令牌:

sh
[root@etcd-node1 node]# sed -i "s/initial-cluster-token: 'token-01'/initial-cluster-token: 'token-test'/g" /srv/etcd/node/config/etcd.yaml
[root@etcd-node1 node]# grep token config/etcd.yaml 
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'token-test'
[root@etcd-node1 node]# 
[root@etcd-node1 node]# rm -rf data.etcd

注意,此处同样要删除数据目录!!!

此时再启动三个节点服务:

sh
[root@etcd-node1 node]#  ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

[root@etcd-node1 node]#

此查看集群ID和成员ID令牌:

sh
[root@etcd-node1 ~]# ech
{"level":"warn","ts":"2025-06-04T00:04:53.663621+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.123:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:53.664028+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000456000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:53.665243+0800","logger":"client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000345a0/192.168.56.122:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+--------+------------+-------+
|          ENDPOINT           | HEALTH |    TOOK    | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.123:2379 |   true | 1.219491ms |       |
| https://192.168.56.121:2379 |   true | 1.082714ms |       |
| https://192.168.56.122:2379 |   true | 2.658924ms |       |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 ~]# ecm
{"level":"warn","ts":"2025-06-04T00:04:55.186876+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000341e0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| 6adc56df00ffcfe0 | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 |      false |
| ceba196a99b5f14e | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 |      false |
| f737d9215ef36e4c | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 ~]# ecs
{"level":"warn","ts":"2025-06-04T00:04:57.043711+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.046273+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.049262+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
{"level":"warn","ts":"2025-06-04T00:04:57.054286+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be000/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | f737d9215ef36e4c |  3.5.18 |   20 kB |     false |      false |         2 |         14 |                 14 |        |
| https://192.168.56.122:2379 | ceba196a99b5f14e |  3.5.18 |   20 kB |     false |      false |         2 |         14 |                 14 |        |
| https://192.168.56.123:2379 | 6adc56df00ffcfe0 |  3.5.18 |   20 kB |      true |      false |         2 |         14 |                 14 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 ~]# rootetcdctl --write-out=fields member list
{"level":"warn","ts":"2025-06-04T00:05:01.131528+0800","logger":"etcd-client","caller":"v3@v3.5.18/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000343c0/192.168.56.121:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: authentication is not enabled"}
"ClusterID" : 16484408410993616614
"MemberID" : 14896246663117402446
"Revision" : 0
"RaftTerm" : 2
"ID" : 7700124978691166176
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false

"ID" : 14896246663117402446
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false

"ID" : 17813945588437446220
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false

[root@etcd-node1 ~]#

可以看到,此时集群id和成员发生了变化!

Snipaste_2025-06-04_00-05-13.png

2.4 还原配置

在以上测试完成后,停止etcd服务,并删除测试使用的/srv/etcd/node目录,并将备份的目录/srv/etcd/node.bak复制为/srv/etcd/node,然后再启动etcd服务。

sh
[root@etcd-node1 ~]# cd /srv/etcd/node
[root@etcd-node1 node]# ./stop.sh
[root@etcd-node1 node]# ps -ef|grep -v grep|grep etcd
[root@etcd-node1 node]# cd ..
[root@etcd-node1 etcd]# rm -rf node
[root@etcd-node1 etcd]# ls -d node
ls: cannot access node: No such file or directory
[root@etcd-node1 etcd]# cp -rp node.bak node
[root@etcd-node1 etcd]# ls -la node
total 26108
drwxr-xr-x 5 root root      191 May  7 22:25 .
drwxr-xr-x 7 root root       73 Jun  4 22:07 ..
drwxr-xr-x 2 root root       23 Jun  2 21:40 config
drwx------ 3 root root       20 Jun  2 22:05 data.etcd
drwxr-xr-x 2 root root       39 May  7 22:26 logs
-rw------- 1 root root 26713408 May  2 11:57 nohup.out
-rw-r--r-- 1 root root        0 Apr  5 22:58 openssl.conf
-rwxr--r-- 1 root root     1007 Apr  5 23:07 start_auto_ssl.sh
-rwxr--r-- 1 root root      105 May  7 22:33 start_by_config.sh
-rwxr--r-- 1 root root      954 Mar  2 22:52 start_no_ssl.sh
-rwxr--r-- 1 root root     1548 Apr  5 23:59 start.sh
-rwxr--r-- 1 root root       61 Apr  5 23:20 stop.sh
[root@etcd-node1 etcd]# cd node
[root@etcd-node1 node]# ./start_by_config.sh
[root@etcd-node1 node]# nohup: appending output to ‘nohup.out’

此时再检查一下etcd相关的命令,以及查看集群ID和成员ID等信息:

sh
[root@etcd-node1 node]# ech
+-----------------------------+--------+------------+-------+
|          ENDPOINT           | HEALTH |    TOOK    | ERROR |
+-----------------------------+--------+------------+-------+
| https://192.168.56.121:2379 |   true | 1.562419ms |       |
| https://192.168.56.122:2379 |   true | 1.022737ms |       |
| https://192.168.56.123:2379 |   true | 2.509065ms |       |
+-----------------------------+--------+------------+-------+
[root@etcd-node1 node]# ecm
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  | NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
| a7d7b09bf04ad21b | started | node3 | https://192.168.56.123:2380 | https://192.168.56.123:2379 |      false |
| d553b4da699c7263 | started | node2 | https://192.168.56.122:2380 | https://192.168.56.122:2379 |      false |
| e14cb1abc9daea5b | started | node1 | https://192.168.56.121:2380 | https://192.168.56.121:2379 |      false |
+------------------+---------+-------+-----------------------------+-----------------------------+------------+
[root@etcd-node1 node]# ecs
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.56.121:2379 | e14cb1abc9daea5b |  3.5.18 |   25 kB |     false |      false |        30 |       1212 |               1212 |        |
| https://192.168.56.122:2379 | d553b4da699c7263 |  3.5.18 |   25 kB |      true |      false |        30 |       1213 |               1213 |        |
| https://192.168.56.123:2379 | a7d7b09bf04ad21b |  3.5.18 |   25 kB |     false |      false |        30 |       1214 |               1214 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@etcd-node1 node]# rootetcdctl --write-out=fields member list
"ClusterID" : 11928626832149063955
"MemberID" : 16234546108147886683
"Revision" : 0
"RaftTerm" : 30
"ID" : 12094329508124611099
"Name" : "node3"
"PeerURL" : "https://192.168.56.123:2380"
"ClientURL" : "https://192.168.56.123:2379"
"IsLearner" : false

"ID" : 15371828803313365603
"Name" : "node2"
"PeerURL" : "https://192.168.56.122:2380"
"ClientURL" : "https://192.168.56.122:2379"
"IsLearner" : false

"ID" : 16234546108147886683
"Name" : "node1"
"PeerURL" : "https://192.168.56.121:2380"
"ClientURL" : "https://192.168.56.121:2379"
"IsLearner" : false

[root@etcd-node1 node]# date
Wed Jun  4 22:08:38 CST 2025
[root@etcd-node1 node]#

Snipaste_2025-06-04_22-09-38.png

可以看到,etcd恢复正常,集群ID和成员ID也与测试前的一致!!此时使用etcd-workbench登陆查看集群信息也是恢复正常的!

3. etcd启动顺序

Snipaste_2025-06-04_20-57-53.png

根据前面第2节的实验,绘制了一个流程图,在什么情况下集群ID和成员ID会保持不变,或者重新生成新的集群ID和成员ID信息。

Snipaste_2025-06-04_21-58-32.png

本首页参考 https://notes.fe-mm.com/ 配置而成