背景
不管是什么业务服务,当需要做主备方案
时,一般都会采用keepalived
的虚拟ip
方式来切换。
本章采用容器部署方式,不检查业务服务是否正常,主服务器宕机或者停止keepalived
容器来切换虚拟ip
到备用机器上。
keepalived
的版本信息:
2.3.1
容器镜像制作
我使用centos7.9
做基础镜像,源码编译keepalived
, dockerfile
如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
FROM centos:centos7
WORKDIR /data
COPY . .
# RUN yum makecache -y && yum clean all -y
RUN cp ./third-package/CentOS-Base-amd64.repo /etc/yum.repos.d/CentOS-Base.repo -f && yum makecache -y && yum clean all -y
RUN yum install -y epel-release && yum -y update && yum install -y net-tools vim nginx gcc openssl-devel popt-devel rsyslog gettext nc libnl3 libnl3-devel autoconf libtool
RUN cd third-package && tar -zxvf automake-1.16.5.tar.gz && cd automake-1.16.5 && ./configure \
&& make -j $(nproc) all && make install -j $(nproc) \
&& cd .. && rm automake-1.16.5 automake-1.16.5.tar.gz -rf
RUN cd keepalived-2.3.1 && ./configure --prefix=/usr/local/keepalived && make -j $(nproc) all && make install -j $(nproc)
RUN cp keepalived-2.3.1/keepalived/etc/init.d/keepalived /etc/init.d/ &&\
cp /usr/local/keepalived/sbin/keepalived /usr/sbin/ && \
chmod +x /data/scripts/*.sh && \
find . -mindepth 1 ! -path "./scripts" ! -path "./scripts/*" ! -name "entrypoint.sh" ! -path "./conf" ! -path "./conf/*" -exec rm -rv {} + || true
ENTRYPOINT [ "./entrypoint.sh" ]
|
可以看到,要下载CentOS-Base-amd64.repo(阿里云镜像源)
和automake-1.16.5.tar.gz
。
scripts
目录下有:vip_down.sh
,vip_up.sh
,
vip_down.sh
1
2
3
4
|
#!/bin/bash
LOG_OUT="/proc/1/fd/1"
ip addr del $VIP
echo "$(date '+%F %T') $VIP down............" >> "$LOG_OUT"
|
vip_up.sh
1
2
3
4
|
#!/bin/bash
LOG_OUT="/proc/1/fd/1"
ip addr add $VIP
echo "$(date '+%F %T') $VIP up............." >> "$LOG_OUT"
|
这里要注意的是: >> $LOG_OUT
是将输出追加到/proc/1/fd/1
文件中,这个文件是docker
容器的标准输出文件。
如果没有这个,echo
的输出内容在docker logs -f keepalived
下是看不到的。
1
2
3
4
5
6
7
8
|
start_service(){
envsubst < ./conf/keepalived.conf.tpl > ./conf/keepalived.conf
echo "run keepalived...."
exec keepalived -f ./conf/keepalived.conf -n -l -D -S 0
echo "run done...."
}
start_service
|
exec
很重要,这样能保证keepalived
进程在容器内以pid为1
的方式运行,后续docker-compose down
时,会发送SIGTERM
信号给keepalived
进程,
而keepalived
进程收到信号后,会先执行notify_stop
脚本,然后再退出。
如果没有exec
,直接keepalived -f ./conf/keepalived.conf -n -l -D -S 0
, keepalived
的进程就不会捕获到 SIGTERM
信号。
./conf/keepalived.conf.tpl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
! Configuration File for keepalived
global_defs {
# notification_email {
# acassen@firewall.loc
# failover@firewall.loc
# sysadmin@firewall.loc
# }
# notification_email_from Alexandre.Cassen@firewall.loc
# smtp_server 192.168.200.1
# smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
# vrrp_strict
}
vrrp_instance VI_1 {
state $STATE
interface $INTERFACE
virtual_router_id $ROUTE_ID
priority $PRIORITY
advert_int 1
authentication {
auth_type PASS
auth_pass testuser
}
virtual_ipaddress {
$VIP
}
notify_master "/data/scripts/vip_up.sh"
notify_backup "/data/scripts/vip_down.sh"
notify_fault "/data/scripts/vip_down.sh"
notify_stop "/data/scripts/vip_down.sh"
# Allow packets addressed to the VIPs above to be received
#accept
}
|
这里通过notify_master
,notify_backup
,notify_fault
,notify_stop
来调用vip_up.sh
,vip_down.sh
脚本主动添加/删除虚拟ip
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
x-defaults: &defaults
restart: unless-stopped
environment:
- TZ=Asia/Shanghai
# 主备:MASTER/BACKUP
- STATE=MASTER
# 网卡名
- INTERFACE=eth0
# 优先级: master必须比mackup大,两者差小于20
- PRIORITY=100
# 监听时间间隔
- INTERVAL=3
# route_id
- ROUTE_ID=52
# 虚拟ip: 和实际ip在同一个网段内
- VIP=172.16.0.250 dev eth0
services:
keepalived:
image: keepalived:1.0.0
container_name: "keepalived"
hostname: "keepalived"
network_mode: "host"
privileged: true
cap_add:
- NET_ADMIN
stop_signal: SIGTERM
<<: *defaults
|
这里注意的是:cap_add
, stop_signal
,这样在docker-compose down
时,发送SIGTERM
信号给keepalived
进程。
主备测试
1
2
3
|
主服务器ip: 172.16.4.111
备服务器ip: 172.16.4.113
虚拟ip: 172.16.4.96
|
此时主服务器的docker-compose.yaml
的配置为:
1
2
3
4
5
6
7
8
9
|
environment:
- TZ=Asia/Shanghai
- STATE=MASTER
# 网卡名
- INTERFACE=ens160
- PRIORITY=100
- INTERVAL=6
- ROUTE_ID=52
- VIP=172.16.4.96 dev ens160
|
备服务器的docker-compose.yaml
的配置为:
1
2
3
4
5
6
7
8
9
|
environment:
- TZ=Asia/Shanghai
- STATE=BACKUP
# 网卡名
- INTERFACE=ens160
- PRIORITY=90
- INTERVAL=6
- ROUTE_ID=52
- VIP=172.16.4.96 dev ens160
|
正常启动
主备服务器上keepalived
容器运行之后, 可以看到主服务器上有172.16.4.96
这个虚拟ip, 而备服务器上没有。
主服务器网卡: ens160
1
2
3
4
5
6
7
8
9
|
ip addr show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:83:b9:d7 brd ff:ff:ff:ff:ff:ff
inet 172.16.4.111/22 brd 172.16.7.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 172.16.4.96/32 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::e194:8abe:9eb7:bb02/64 scope link noprefixroute
valid_lft forever preferred_lft forev
|
主服务器上的keepalived
的docker logs -f keepalived
日志为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
ue Oct 14 16:00:22 2025: Assigned address 172.16.4.111 for interface ens160
Tue Oct 14 16:00:22 2025: Assigned address fe80::e194:8abe:9eb7:bb02 for interface ens160
Tue Oct 14 16:00:22 2025: Warning - script chk_web_service is not used
Tue Oct 14 16:00:22 2025: Registering gratuitous ARP shared channel
Tue Oct 14 16:00:22 2025: (VI_1) removing VIPs.
Tue Oct 14 16:00:22 2025: Startup complete
Tue Oct 14 16:00:22 2025: (VI_1) Entering BACKUP STATE (init)
Tue Oct 14 16:00:22 2025: VRRP sockpool: [ifindex( 2), family(IPv4), proto(112), fd(11,12) multicast, address(224.0.0.18)]
2025-10-14 16:00:22 172.16.4.96 dev ens160 down............
Tue Oct 14 16:00:22 2025: (VI_1) received lower priority (90) advert from 172.16.4.113 - discarding
Tue Oct 14 16:00:23 2025: (VI_1) received lower priority (90) advert from 172.16.4.113 - discarding
Tue Oct 14 16:00:24 2025: (VI_1) received lower priority (90) advert from 172.16.4.113 - discarding
Tue Oct 14 16:00:25 2025: (VI_1) received lower priority (90) advert from 172.16.4.113 - discarding
Tue Oct 14 16:00:25 2025: (VI_1) Receive advertisement timeout
Tue Oct 14 16:00:25 2025: (VI_1) Entering MASTER STATE
Tue Oct 14 16:00:25 2025: (VI_1) setting VIPs.
Tue Oct 14 16:00:25 2025: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 172.16.4.96
Tue Oct 14 16:00:25 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:25 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:25 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:25 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:25 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
2025-10-14 16:00:25 172.16.4.96 dev ens160 up.............
Tue Oct 14 16:00:30 2025: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 172.16.4.96
Tue Oct 14 16:00:30 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:30 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:30 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:30 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:00:30 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
|
备服务器网卡: ens160
1
2
3
4
5
6
7
|
ip addr show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:83:0e:04 brd ff:ff:ff:ff:ff:ff
inet 172.16.4.113/22 brd 172.16.7.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::45a0:3144:4cf8:fb8d/64 scope link noprefixroute
valid_lft forever preferred_lft forever
|
可以看到主服务器上有172.16.4.96
这个虚拟ip, 而备服务器上没有,运行ok。
主服务器上keepalived
容器停止
docker-compose down
手动停止主服务器上的keepalived
容器,其日志为:
1
2
3
4
5
6
7
|
ue Oct 14 16:14:09 2025: Stopping
Tue Oct 14 16:14:09 2025: (VI_1) sent 0 priority
Tue Oct 14 16:14:09 2025: (VI_1) removing VIPs.
2025-10-14 16:14:09 172.16.4.96 dev ens160 down............
Tue Oct 14 16:14:10 2025: Stopped - used (self/children) 0.047189/0.011739 user time, 0.151771/0.008893 system time
Tue Oct 14 16:14:10 2025: CPU usage (self/children) user: 0.005198/0.069158 system: 0.022745/0.163343
Tue Oct 14 16:14:10 2025: Stopped Keepalived v2.3.1 (05/24,2024
|
可以看到 执行了vip_down.sh
脚本,删除了虚拟ip
, 主服务器网卡ens160
上没有172.16.4.96
这个ip了。
1
2
3
4
5
6
7
|
ip addr show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:83:b9:d7 brd ff:ff:ff:ff:ff:ff
inet 172.16.4.111/22 brd 172.16.7.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::e194:8abe:9eb7:bb02/64 scope link noprefixroute
valid_lft forever preferred_lft forever
|
备服务器的keepalived
日志为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
Oct 14 16:08:56 2025: (VI_1) Receive advertisement timeout
Tue Oct 14 16:08:56 2025: (VI_1) Entering MASTER STATE
Tue Oct 14 16:08:56 2025: (VI_1) setting VIPs.
Tue Oct 14 16:08:56 2025: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 172.16.4.96
Tue Oct 14 16:08:56 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:08:56 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:08:56 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:08:56 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:08:56 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
2025-10-14 16:08:56 172.16.4.96 dev ens160 up.............
Tue Oct 14 16:09:01 2025: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 172.16.4.96
Tue Oct 14 16:09:01 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:09:01 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:09:01 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:09:01 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
Tue Oct 14 16:09:01 2025: Sending gratuitous ARP on ens160 for 172.16.4.96
|
查看备服务器网卡ens160
上是否有172.16.4.96
这个ip
1
2
3
4
5
6
7
8
9
|
ip addr show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:83:0e:04 brd ff:ff:ff:ff:ff:ff
inet 172.16.4.113/22 brd 172.16.7.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 172.16.4.96/32 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::45a0:3144:4cf8:fb8d/64 scope link noprefixroute
valid_lft forever preferred_lft forever
|
备服务器上有172.16.4.96
这个虚拟ip, 主服务器上没有此ip, 说明虚拟ip
切换ok。
总结
-
当keepalived
使用容器部署时,如果想要监听业务服务,通过业务服务的异常来切换主备服务器,
需要在keepalived
的配置文件中添加track_script
项,指定监听的业务服务。
因为容器是独立运行的,所以不能使用ps -aux|grep xxx
来判断业务服务是否异常,可以检查tcp/udp/http
端口来判断。
-
不想监听业务服务时,就可以采用上述方式。
-
如果keepalived
的配置中不使用notify_stop
这样的钩子,当主服务器keepalived
容器停止时,网卡不会删除虚拟ip
,
备服务器网卡也有虚拟ip
,请求还是会到主服务器上。 但是如果主服务器宕机, 那么就不会有这样的问题。
-
linux的内核参数配置/etc/sysctl.conf
里net.ipv4.ip_nonlocal_bind=1
。