Redis集群之主从复制、哨兵模式

redis单节点可能存在的问题,以及CAP原则, Redis三种集群方式中其中两种方式

Posted by 石福鹏 on 2021-03-28
Estimated Reading Time 37 Minutes
Words 7.5k In Total
Viewed Times

一、问题引入

之前的几篇文章一直写的都是基于单机、或者单节点,或者单实例,那么他会有什么问题呢?

1:单点故障

2:容量有限

3:压力(性能)

如何解决? 如何分别解决上面三个独立的问题?有没有整合方案?

二、解决问题

AKF:描述的是XYZ三个轴,分别对技术的一种拆解

AKF:

x:全量、镜像

y:业务、功能

z:优先级、逻辑再拆分

1、解决单点故障问题(X轴)

1、如果只有一台实例的话,一般基于本地会有可靠性,即基于本磁盘的可靠,但是如果基于本地磁盘的物理机还是会挂,那就是做Redis的副本,或者说不是redis,是数据库的副本(一主多备),那么这样可以解决单点故障的问题

这样,就可以让增删改使用主机,查使用备机,实现读写分离

但是,这种方式不能解决容量有效和压力问题,因为这种方式是全量镜像 ,即加入主机有10个G数据,那么备机也是10个G,Redis会一直同步数据的

注意:主备、主从不是一个意思

2、解决容量有限问题(Y轴)

Redis只有4G的容量,但是数据有10个G,要怎么存?

对要存的数据,按照功能、业务划分来存,每个redis中存的数据不是一类数据数据,比如一个redis存订单信息,另一个redis存用户信息。Mysql中类似的解决方式即分库。当然,存订单信息的实例有可能挂了,这个时候,就需要从X轴扩展,即1中的主备。

3、解决压力问题(Z轴)

数据量再大,按照优先级、逻辑再拆分,类似于微服务,或者说继续拿redis举例。使用一致性哈希,或者cluster的拆分原理,或者说按照业务直接划区,就像特斯拉在中国建工厂服务中国片区一样

image-20210330135324730

三、新的问题

什么问题都是有两面性的,解决上面的问题就会带来一些其他的问题

1、数据一致性问题

首先主备数据的同步(数据一致性问题)

a、强一致性

所有的节点阻塞直到数据全部一致

缺点:破坏可用性,即一个节点出问题(比如其中一个备机进程异常退出了,或者执行慢超时了,或网络异常等),导致此次数据写失败

image-20210330140033994

b、弱一致性

通过异步方式处理数据同步

缺点:可能会丢失一部分数据(同步失败)

image-20210330140106930

c、最终一致性

解决上面的问题,还有一种解决方案:

那就是在主机和备机之间增加一种可靠的、集群、且响应速度足够快的东西,比如kafka

客户端告诉redis,redis把写的请求先发给kafka (同步阻塞),kafka接受成功后立即返回,在由于redis返回给客户端,而数据同步,因为kafk足够可靠,所以两台redis备机最终从kafka中取回需要的东西,数据就会达到一致

image-20210330141135677

但是,基于最终一致性的解决方案:

因为是最终一致性,就有可能取到不一致性的数据,但是强调有中间可靠的东西来实现强一致性

2、高可用问题

主备:客户端只访问主,备是为了当主机挂掉之后,接替主,要客户端可以拿到数据

主从:客户端除了访问主,也可以访问其他的

redis,企业中常用的是主从复制的概念,

不论是主备还是主从,都有一个主,一般情况下,所有的增删改都发生在主上,备机要么不发生操作(备),要么只发生读操作(从)

所以又会引出一个问题:

主又是一个单点,所以又要对主做高可用(HA) ,目的就是当主机挂掉的话,把备机变成一个主机,切换他。

因为人肯定是不靠谱的,所以肯定要用一个程序来监控单点故障问题,同时只要是一个程序,程序依然有可能出现问题,所以监控程序本身也要成为一变多的集群

image-20210330150627550

为何建议的使用奇数台?

以为偶数台和奇数台承担风险的能力是一致的,但是发生风险是不一样的

三台和四台,最多只能有一台发生故障,否则不能进行投票,但是三台和四台,一台发生故障的可能性,4肯定比3的可能性要大

以上,一致性、可用性、分区容错性(或分区容忍性),即CAP原则,这三个要素最多实现两点,不可能实现三者兼顾。

最基本的例子就是,一致性和可用性不可能同时满足强一致性或者绝对高可用,因为强一致性就回破坏可用性

四、主从复制

Redis是异步复制,其特点是低延迟和高性能

1、实战

a、实例准备

现在有了两个redis实例

1
2
3
4
5
6
[root@hadoop01 ~]# ps -fe | grep  redis
root 858 1 0 02:09 ? 00:01:37 /usr/local/bin/redis-server 127.0.0.1:6380
root 2112 1908 0 13:43 pts/8 00:00:17 redis-server 127.0.0.1:6379
root 2117 1984 0 13:43 pts/10 00:00:00 redis-cli -p 6379 --raw
root 2443 2386 0 16:11 pts/0 00:00:00 grep --color=auto redis
[root@hadoop01 ~]#

我们再启动一个

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@hadoop01 /]# cd /usr/local/redis-6.0.0/utils/
[root@hadoop01 utils]# ./install_server.sh
Welcome to the redis service installer
This script will help you easily set up a running redis server

Please select the redis port for this instance: [6379] 6381
Please select the redis config file name [/etc/redis/6381.conf]
Selected default - /etc/redis/6381.conf
Please select the redis log file name [/var/log/redis_6381.log]
Selected default - /var/log/redis_6381.log
Please select the data directory for this instance [/var/lib/redis/6381]
Selected default - /var/lib/redis/6381
Please select the redis executable path [/usr/local/bin/redis-server]
Selected config:
Port : 6381
Config file : /etc/redis/6381.conf
Log file : /var/log/redis_6381.log
Data dir : /var/lib/redis/6381
Executable : /usr/local/bin/redis-server
Cli Executable : /usr/local/bin/redis-cli
Is this ok? Then press ENTER to go on or Ctrl-C to abort.
Copied /tmp/6381.conf => /etc/init.d/redis_6381
Installing service...
Successfully added to chkconfig!
Successfully added to runlevels 345!
Starting Redis server...
Installation successful!
[root@hadoop01 utils]#

因为是做测试,所以不直接在系统里修改,我们把redis配置文件拷贝出来一份

1
2
3
4
5
6
7
8
9
[root@hadoop01 utils]# service redis_6381 stop
Stopping ...
Redis stopped
[root@hadoop01 utils]# cd /
[root@hadoop01 /]# mkdir test
[root@hadoop01 /]# cd test
[root@hadoop01 test]# cp /etc/redis/* ./
[root@hadoop01 test]# ls
6379.conf 6380.conf 6381.conf

2、修改配置

需要对配置文件做一些修改:(为了让这三台实例都是前台阻塞运行,没有AOF日志)

先改成前台运行模式(后面会做前台阻塞运行,让日志直接打印到屏幕上)

1
daemonize no

先把6379日志文件位置注释掉

1
# logfile /var/log/redis_6379.log

把AOF日志关闭,也就是只有RDB(appendonlyyes改为no)

1
appendonly no

6380、6381两台实例同理

3、启动

删除三台实例的持久化的目录

1
2
3
4
5
6
[root@hadoop01 test]# cd /var/lib/redis/6379
[root@hadoop01 6379]# rm -rf *
[root@hadoop01 6379]# ll
总用量 0

//其他两台实例同理

运行(先保证几台实例都没启动),注意这里需要指定刚才编辑的配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@hadoop01 redis]# redis-server /test/6379.conf 
2621:C 30 Mar 2021 17:10:49.733 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2621:C 30 Mar 2021 17:10:49.733 # Redis version=6.0.0, bits=64, commit=00000000, modified=0, pid=2621, just started
2621:C 30 Mar 2021 17:10:49.734 # Configuration loaded
2621:M 30 Mar 2021 17:10:49.734 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 2621
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

2621:M 30 Mar 2021 17:10:49.736 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2621:M 30 Mar 2021 17:10:49.736 # Server initialized
2621:M 30 Mar 2021 17:10:49.736 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2621:M 30 Mar 2021 17:10:49.736 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2621:M 30 Mar 2021 17:10:49.737 * Ready to accept connections

再开两个tab,分别其他6380、6381

期望,6379是主,6380、6381是主从复制中的从机

4、主从配置

a、【Redis集群方式一】通过命令配置(人为方式)

主机不动,只需要6380、6381追随主机;

老版本是SLAVEOF host port这个命令,redis5.0之后是REPLICAOF host port

1
2
3
4
5
6
7
8
127.0.0.1:6380> help SLAVEOF

SLAVEOF host port
summary: Make the server a replica of another instance, or promote it as master. Deprecated starting with Redis 5. Use REPLICAOF instead.
since: 1.0.0
group: server

127.0.0.1:6380>

ok ,我们到6380的客户端实施:

1
2
3
127.0.0.1:6380> REPLICAOF 127.0.0.1 6379
OK
127.0.0.1:6380>

然后我们去主机6379控制台看看:

1
2
3
4
5
6
7
8
9
3971:M 31 Mar 2021 10:22:35.933 * Ready to accept connections
3971:M 31 Mar 2021 10:36:23.333 * Replica 127.0.0.1:6380 asks for synchronization
3971:M 31 Mar 2021 10:36:23.333 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'f7aea8d38b63623a3d66330aac8a128e45ef6e2c', my replication IDs are 'b0102be2fc0f8d3f08cd98c46b967bf8ee5f6544' and '0000000000000000000000000000000000000000')
3971:M 31 Mar 2021 10:36:23.333 * Starting BGSAVE for SYNC with target: disk
3971:M 31 Mar 2021 10:36:23.338 * Background saving started by pid 3992
3992:C 31 Mar 2021 10:36:23.344 * DB saved on disk
3992:C 31 Mar 2021 10:36:23.345 * RDB: 4 MB of memory used by copy-on-write
3971:M 31 Mar 2021 10:36:23.436 * Background saving terminated with success
3971:M 31 Mar 2021 10:36:23.436 * Synchronization with replica 127.0.0.1:6380 succeeded

根据日志可以看出,当从机REPLICAOF到主机之后,从机请求同步,使用bgsave同步数据到磁盘,且同步成功。

再看看6380的控制台(从机先把自己的数据删除掉)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
3976:S 31 Mar 2021 10:36:22.278 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
3976:S 31 Mar 2021 10:36:22.278 * REPLICAOF 127.0.0.1:6379 enabled (user request from 'id=4 addr=127.0.0.1:53660 fd=7 name= age=453 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=44 qbuf-free=32724 obl=0 oll=0 omem=0 events=r cmd=replicaof user=default')
3976:S 31 Mar 2021 10:36:23.332 * Connecting to MASTER 127.0.0.1:6379
3976:S 31 Mar 2021 10:36:23.332 * MASTER <-> REPLICA sync started
3976:S 31 Mar 2021 10:36:23.332 * Non blocking connect for SYNC fired the event.
3976:S 31 Mar 2021 10:36:23.333 * Master replied to PING, replication can continue...
3976:S 31 Mar 2021 10:36:23.333 * Trying a partial resynchronization (request f7aea8d38b63623a3d66330aac8a128e45ef6e2c:1).
3976:S 31 Mar 2021 10:36:23.343 * Full resync from master: d4af9c563e480428c2a16e4c949e0b5558af4fd0:0
3976:S 31 Mar 2021 10:36:23.343 * Discarding previously cached master state.
3976:S 31 Mar 2021 10:36:23.437 * MASTER <-> REPLICA sync: receiving 175 bytes from master to disk
3976:S 31 Mar 2021 10:36:23.437 * MASTER <-> REPLICA sync: Flushing old data #这里是需要将自己的老的数据显删除
3976:S 31 Mar 2021 10:36:23.437 * MASTER <-> REPLICA sync: Loading DB in memory
3976:S 31 Mar 2021 10:36:23.437 * Loading RDB produced by version 6.0.0
3976:S 31 Mar 2021 10:36:23.437 * RDB age 0 seconds
3976:S 31 Mar 2021 10:36:23.437 * RDB memory usage when created 1.82 Mb
3976:S 31 Mar 2021 10:36:23.437 * MASTER <-> REPLICA sync: Finished with success

我们在主机6379建立一个客户端,创建一个数据,在主机获取数据肯定也是没有问题的:

1
2
3
4
5
6
7
8
[root@hadoop01 ~]# redis-cli -p 6379
127.0.0.1:6379> set k1 hello
OK
127.0.0.1:6379> keys *
1) "k1"
127.0.0.1:6379> get k1
"hello"
127.0.0.1:6379>

我们再去6380去看看有没有数据:

1
2
3
4
5
127.0.0.1:6380> keys *
1) "k1"
127.0.0.1:6380> get k1
"hello"
127.0.0.1:6380>

没有问题,所以说数据是同步过来了

需要注意的是,如果我们去从机上执行set key,会发现不能写

1
2
3
127.0.0.1:6380> set k2 world
(error) READONLY You can't write against a read only replica.
127.0.0.1:6380>

这个是可以在配置文件中进行设置的

我们如果这样操作,先在6381中创建一个key,然后再让6381追随6379,这个时候发现,从机上刚刚创建的k2是没有了(因为会先清除老的数据,再同步主机的数据),只有主机同步过来的数据了

1
2
3
4
5
6
7
8
9
10
11
[root@hadoop01 ~]# redis-cli -p 6381
127.0.0.1:6381> set k2 aaa
OK
127.0.0.1:6381> keys *
1) "k2"
127.0.0.1:6381> get k2
"aaa"
127.0.0.1:6381> REPLICAOF 127.0.0.1 6379
OK
127.0.0.1:6381> keys *
1) "k1"

这个时候,再去看看持久化文件夹,发现都有dump.rdb文件了

增量同步&从机挂了

如果这个时候6381挂掉了,我们看到6379显示

1
3971:M 31 Mar 2021 11:47:13.143 # Connection with replica 127.0.0.1:6381 lost.

然后主机还是写数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
127.0.0.1:6379> set k3 qweqweqw
OK
127.0.0.1:6379> set k4 32f23f23
OK
127.0.0.1:6379> set k5 3d3dqdddq
OK
127.0.0.1:6379> set k6 3d33d12d
OK
127.0.0.1:6379> keys *
1) "k6"
2) "k3"
3) "k5"
4) "k4"
5) "k1"
127.0.0.1:6379>

6380数据同步是没有问题的,这个时候6381恢复了

1
[root@hadoop01 ~]# redis-server /test/6381.conf --replicaof 127.0.0.1 6379

同时看到6380的日志:

1
2
3
4
5
6
7
4172:S 31 Mar 2021 11:50:10.656 * Connecting to MASTER 127.0.0.1:6379
4172:S 31 Mar 2021 11:50:10.657 * MASTER <-> REPLICA sync started
4172:S 31 Mar 2021 11:50:10.657 * Non blocking connect for SYNC fired the event.
4172:S 31 Mar 2021 11:50:10.657 * Master replied to PING, replication can continue...
4172:S 31 Mar 2021 11:50:10.657 * Trying a partial resynchronization (request d4af9c563e480428c2a16e4c949e0b5558af4fd0:5936).
4172:S 31 Mar 2021 11:50:10.658 * Successful partial resynchronization with master.
4172:S 31 Mar 2021 11:50:10.658 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

发现这个过程并没有落RDB的过程,但是

1
2
3
4
5
6
7
127.0.0.1:6381> keys *
1) "k1"
2) "k5"
3) "k3"
4) "k4"
5) "k6"
127.0.0.1:6381>

数据确实同步过来了,这就是所谓的增量同步 的过程

遗留问题

这里有一点需要特殊说明一下,当我们重新结束6381服务,采取这种方式启动服务:

redis-server /test/6381.conf --replicaof 127.0.0.1 6379 --appendonly yes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@hadoop01 ~]# redis-server /test/6381.conf --replicaof 127.0.0.1 6379 --appendonly yes
1296:C 31 Mar 2021 14:59:57.942 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1296:C 31 Mar 2021 14:59:57.942 # Redis version=6.0.0, bits=64, commit=00000000, modified=0, pid=1296, just started
1296:C 31 Mar 2021 14:59:57.942 # Configuration loaded
1296:S 31 Mar 2021 14:59:57.943 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6381
| `-._ `._ / _.-' | PID: 1296
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

1296:S 31 Mar 2021 14:59:57.944 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1296:S 31 Mar 2021 14:59:57.944 # Server initialized
1296:S 31 Mar 2021 14:59:57.944 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1296:S 31 Mar 2021 14:59:57.944 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1296:S 31 Mar 2021 14:59:57.946 * Ready to accept connections
1296:S 31 Mar 2021 14:59:57.947 * Connecting to MASTER 127.0.0.1:6379
1296:S 31 Mar 2021 14:59:57.948 * MASTER <-> REPLICA sync started
1296:S 31 Mar 2021 14:59:57.949 * Non blocking connect for SYNC fired the event.
1296:S 31 Mar 2021 14:59:57.968 * Master replied to PING, replication can continue...
1296:S 31 Mar 2021 14:59:57.970 * Partial resynchronization not possible (no cached master)
1296:S 31 Mar 2021 14:59:57.982 * Full resync from master: 99c61ed7f56cfb5c98c8e9fddf65a641731f7f7b:0
1296:S 31 Mar 2021 14:59:58.070 * MASTER <-> REPLICA sync: receiving 243 bytes from master to disk
1296:S 31 Mar 2021 14:59:58.070 * MASTER <-> REPLICA sync: Flushing old data
1296:S 31 Mar 2021 14:59:58.071 * MASTER <-> REPLICA sync: Loading DB in memory
1296:S 31 Mar 2021 14:59:58.071 * Loading RDB produced by version 6.0.0
1296:S 31 Mar 2021 14:59:58.071 * RDB age 1 seconds
1296:S 31 Mar 2021 14:59:58.071 * RDB memory usage when created 1.83 Mb
1296:S 31 Mar 2021 14:59:58.071 * MASTER <-> REPLICA sync: Finished with success
1296:S 31 Mar 2021 14:59:58.072 * Background append only file rewriting started by pid 1302
1296:S 31 Mar 2021 14:59:58.099 * AOF rewrite child asks to stop sending diffs.
1302:C 31 Mar 2021 14:59:58.100 * Parent agreed to stop sending diffs. Finalizing AOF...
1302:C 31 Mar 2021 14:59:58.100 * Concatenating 0.00 MB of AOF diff received from parent.
1302:C 31 Mar 2021 14:59:58.100 * SYNC append only file rewrite performed
1302:C 31 Mar 2021 14:59:58.100 * AOF rewrite: 4 MB of memory used by copy-on-write
1296:S 31 Mar 2021 14:59:58.151 * Background AOF rewrite terminated with success
1296:S 31 Mar 2021 14:59:58.151 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1296:S 31 Mar 2021 14:59:58.151 * Background AOF rewrite finished successfully

发现一个问题:不开启AOF的时候,数据是增量同步,但是开启了AOF之后(根据上面33~41行代码可以看出),就变成了全量同步,这点没太理解就是这样设计的还是?

RDB是会在dump.rdb记录追随主的xxid的,,那么我在进行数据同步的时候,这样我就可以判断是增量数据,但是开启AOF之后,appendaof.aof中并没有记录,前面部分就只是一个传统的RDB,并没有这个id,因此,在主机再写数据的时候,就无法判定是不是直接追加的,所以直接全量更新了。这个设计不知道是设计如此,还是一个bug,我目前还没搞懂。

dump.rdb文件

image-20210331152452078

appendaof.aof文件

image-20210331152538929

所以,记住只要开启AOF,就是全量同步 这句话就可以了

主机挂了

首先,主机是知道有多少个从机的。 这句话需要记住,哨兵会用到,如果没有哨兵,是没法自动化的,那么人为 如何将其中一台变为主

追随主的命令是REPLICAOF HOST PORT ,那么不想追随了,想自己(6380)变成主机,就是REPLICAOF no one,但是这个时候,其他的从因为之前追随的是6379,就会一直connecting 6379,这个时候就需要人为的让6381也追随6380

1
REPLICAOF 127.0.0.1 6380
从机支持增删改

上面讲的都是基于主机支持增删改查,从机只支持查询,但是这个是可以配置的.找到

1
2
3
4
5
6
7
8
################################# REPLICATION #################################
# replicaof <masterip> <masterport> # 可以在这里指定追随者
replica-serve-stale-data yes # 如果redis刚启动过,也追随了某一个主,这个时候需要同步数据,比如主有4g的数据,传输过程有一会,这个过程中,备机中老的数据要不要对外暴露,要不要支持查询
replica-read-only yes # 备机是不是只支持查询 no时支持write
repl-diskless-sync no # 同步数据有两种方式,一种是主先落RDB,再通过网络IO讲RDB同步到从,另一种是直接通过网络IO直接传输给从,默认是不直接通过网络,即no,默认走磁盘。 见下图
repl-backlog-size 1mb # 增量复制的队列大小
min-replicas-to-write 3
min-replicas-max-lag 10

image-20210331200831197

b、【Redis集群方式二】主从复制高可用-Sentinel(哨兵)

Redis的Sentinel系统用于管理多个redis服务器,其实上面有讲到额监控其实就是哨兵sentinel

它的三个任务是:监控、提醒、故障自动迁移

要想启动一个哨兵,至少需要以下的配置:

1
2
3
4
5
 sentinel monitor mymaster 127.0.0.1 6379 2
#sentinel monitor 监控谁 它的IP 端口号 投票的范围权重值
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1

我们先把所有的redis-server停掉,然后准备配置文件

1
2
3
4
5
6
7
8
9
10
11
12
[root@hadoop01 ~]# cd /test/
[root@hadoop01 test]# ll
总用量 240
-rw-r--r-- 1 root root 81553 3月 30 17:27 6379.conf
-rw-r--r-- 1 root root 81553 3月 30 17:27 6380.conf
-rw-r--r-- 1 root root 81553 3月 30 17:28 6381.conf
[root@hadoop01 test]# vi sentinel_6379.conf
port 26379
sentinel monitor mymaster 127.0.0.1 6379 2
:wq
[root@hadoop01 test]# cp sentinel_6379.conf sentinel_6380.conf
[root@hadoop01 test]# cp sentinel_6379.conf sentinel_6381.conf

同理,修改sentinel_6380.confport的值为26380, sentinel_6381.confport值为26381

1
2
3
4
5
6
7
8
9
[root@hadoop01 test]# ll
总用量 252
-rw-r--r-- 1 root root 81553 3月 30 17:27 6379.conf
-rw-r--r-- 1 root root 81553 3月 30 17:27 6380.conf
-rw-r--r-- 1 root root 81553 3月 30 17:28 6381.conf
-rw-r--r-- 1 root root 56 4月 2 16:39 sentinel_6379.conf
-rw-r--r-- 1 root root 56 4月 2 16:41 sentinel_6380.conf
-rw-r--r-- 1 root root 56 4月 2 16:41 sentinel_6381.conf
[root@hadoop01 test]#

ok,配置文件写好之后,就可以启动服务了,分别启动,并让80,81追随79

redis-server /test/6379.conf

redis-server /test/6380.conf --replicaof 127.0.0.1 6379

redis-server /test/6381.conf --replicaof 127.0.0.1 6379

这样主从都跑起来了,接下来就需要启动哨兵了

根据官方文档,两种方式

对于 redis-sentinel 程序, 你可以用以下命令来启动 Sentinel 系统:

对于 redis-server 程序, 你可以用以下命令来启动一个运行在 Sentinel 模式下的 Redis 服务器:

1
redis-server /path/to/sentinel.conf --sentinel

两种方法都可以启动一个 Sentinel 实例。

启动 Sentinel 实例必须指定相应的配置文件, 系统会使用配置文件来保存 Sentinel 的当前状态, 并在 Sentinel 重启时通过载入配置文件来进行状态还原。

如果启动 Sentinel 时没有指定相应的配置文件, 或者指定的配置文件不可写(not writable), 那么 Sentinel 会拒绝启动。

redis-server /test/sentinel_6379.conf --sentinel使用命令启动哨兵:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@hadoop01 ~]# redis-server /test/sentinel_6379.conf --sentinel
3016:X 02 Apr 2021 17:17:51.170 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3016:X 02 Apr 2021 17:17:51.170 # Redis version=6.0.0, bits=64, commit=00000000, modified=0, pid=3016, just started
3016:X 02 Apr 2021 17:17:51.170 # Configuration loaded
3016:X 02 Apr 2021 17:17:51.170 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 3016
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

3016:X 02 Apr 2021 17:17:51.171 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3016:X 02 Apr 2021 17:17:51.173 # Sentinel ID is 73d67c4a5679af45c129a34d5b0761bbd430cf6e
3016:X 02 Apr 2021 17:17:51.173 # +monitor master mymaster 127.0.0.1 6379 quorum 2
3016:X 02 Apr 2021 17:17:51.174 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3016:X 02 Apr 2021 17:17:51.175 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

上面代码26~28行我们看到:我们配置文件中只给出了master是6379,但是发现了两个slave,这是因为:

主是知道哪些从连接了它,所以哨兵只要监控了主,那么就会知道哪些从(其实实现用的就是redis的发布、订阅)

我们再起一个哨兵

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@hadoop01 ~]#  redis-server /test/sentinel_6380.conf --sentinel
3084:X 02 Apr 2021 17:27:53.638 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3084:X 02 Apr 2021 17:27:53.638 # Redis version=6.0.0, bits=64, commit=00000000, modified=0, pid=3084, just started
3084:X 02 Apr 2021 17:27:53.638 # Configuration loaded
3084:X 02 Apr 2021 17:27:53.639 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26380
| `-._ `._ / _.-' | PID: 3084
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

3084:X 02 Apr 2021 17:27:53.640 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3084:X 02 Apr 2021 17:27:53.644 # Sentinel ID is 036b70b06fa4111367c69e978031c9262d25db52
3084:X 02 Apr 2021 17:27:53.644 # +monitor master mymaster 127.0.0.1 6379 quorum 2
3084:X 02 Apr 2021 17:27:53.646 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3084:X 02 Apr 2021 17:27:53.651 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3084:X 02 Apr 2021 17:27:54.267 * +sentinel sentinel 73d67c4a5679af45c129a34d5b0761bbd430cf6e 127.0.0.1 26379 @ mymaster 127.0.0.1 6379

我们发现,这个哨兵给我们的信息和前一个哨兵的信息,除了可以看到有一个主,两个从,还有发现刚才的那个哨兵。这就是我们前面讲的,监控中,两两组要组建成势力,同样,我们再起一个

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@hadoop01 ~]# redis-server /test/sentinel_6381.conf --sentinel
3150:X 02 Apr 2021 17:32:57.175 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3150:X 02 Apr 2021 17:32:57.175 # Redis version=6.0.0, bits=64, commit=00000000, modified=0, pid=3150, just started
3150:X 02 Apr 2021 17:32:57.175 # Configuration loaded
3150:X 02 Apr 2021 17:32:57.176 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.0 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26381
| `-._ `._ / _.-' | PID: 3150
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

3150:X 02 Apr 2021 17:32:57.177 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3150:X 02 Apr 2021 17:32:57.181 # Sentinel ID is 75fd04ec24a12fc2f563523b1d0ef3b45247ff51
3150:X 02 Apr 2021 17:32:57.181 # +monitor master mymaster 127.0.0.1 6379 quorum 2
3150:X 02 Apr 2021 17:32:57.184 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:32:57.185 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:32:57.815 * +sentinel sentinel 036b70b06fa4111367c69e978031c9262d25db52 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:32:58.175 * +sentinel sentinel 73d67c4a5679af45c129a34d5b0761bbd430cf6e 127.0.0.1 26379 @ mymaster 127.0.0.1 6379

主、从以及另外两台哨兵都知道了。

那之前的故障转移都是人为操作的,有了哨兵之后呢?

我们把7379(主)进程结束掉,来模拟主故障,看看效果(一般会有一会的延迟)

我们可以在哨兵的控制台中看到,由于主挂了,由几个哨兵投票选举出了新的主6381

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
3150:X 02 Apr 2021 17:44:41.794 # +sdown master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:41.862 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
3150:X 02 Apr 2021 17:44:41.862 # +new-epoch 1
3150:X 02 Apr 2021 17:44:41.862 # +try-failover master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:41.864 # +vote-for-leader 75fd04ec24a12fc2f563523b1d0ef3b45247ff51 1
3150:X 02 Apr 2021 17:44:41.875 # 73d67c4a5679af45c129a34d5b0761bbd430cf6e voted for 75fd04ec24a12fc2f563523b1d0ef3b45247ff51 1
3150:X 02 Apr 2021 17:44:41.876 # 036b70b06fa4111367c69e978031c9262d25db52 voted for 75fd04ec24a12fc2f563523b1d0ef3b45247ff51 1
3150:X 02 Apr 2021 17:44:41.920 # +elected-leader master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:41.920 # +failover-state-select-slave master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:41.987 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:41.987 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:42.039 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:42.749 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:42.749 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:42.831 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:42.996 # -odown master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:43.783 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:43.783 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:43.854 # +failover-end master mymaster 127.0.0.1 6379
3150:X 02 Apr 2021 17:44:43.854 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
3150:X 02 Apr 2021 17:44:43.855 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
3150:X 02 Apr 2021 17:44:43.855 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
3150:X 02 Apr 2021 17:45:13.922 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

我们尝试去开一个6381的客户端,set一个值,然后在6380看数据是否同步过来了,这就完成了自动故障转移

注意:

哨兵是会修改配置文件的,随便打开一个看看就会发现增加了很多东西

那么前面有说过一个问题,哨兵是如何知道主从,以及其他的哨兵的,就是通过发布订阅,哨兵通过监控主redis,然后主通过发布订阅发现其他哨兵,可以开一个主redis的客户端,通过psubcribe 查看发布订阅信息。

image-20210402175917508


如果您喜欢此博客或发现它对您有用,则欢迎对此发表评论。 也欢迎您共享此博客,以便更多人可以参与。 如果博客中使用的图像侵犯了您的版权,请与作者联系以将其删除。 谢谢 !