I’ve successfully used MySQL statement-based replication for several years across data centers and understand it’s quirks.
While at the MySQL Conference, I tried to see how DRBD could help the installations I manage, but I just can’t drink the DRBD Kool-Aid.
MySQL Replication Pluses
- Free
- Easy to setup if you already have a backup and master position
- No shared storage to manage or corrupt
- Light network load
- Can use master for r/w and slaves for r.
- can do maintenance on slave (ALTER TABLE, etc.) and failover afterwards
- works well across Internet even with high-latency
- many replication problems simple and hand-fixable
MySQL Replication Minuses
- Slaves can/will get out of sync with the master, typically noticed after a few weeks or with Maatkit
- Changing masters requires rebuilding slaves
- There is always some replication lag when there is a busy master
- no checksums or 2-phase commit
DRBD is a low-level driver to copy a disk partition in near real-time from a master to a failover node (cold standby.)
MySQL with DRBD Pluses
- Free
- No fsck or transaction log replay needed if manual failover.
- Slaves don’t need SET MASTER updated unless DRBD fails.
MySQL with DRBD Minuses
- DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
- if the master panics, then after failover both fsck and transaction logs replay must be performed
- more work to setup initially than statement-based replication
- NIC and network corruption is also propagated.
- Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition
- Could generate a lot of network traffic.
- cannot do maintenance on cold standby database
- 2 heartbeats needed on a reliable, local network
I can see how MySQL/DRBD would be appealing for those who operate on a reliable network and don’t need Master-Master for load or maintenance, or who have many slaves that cannot easily be rebuilt.
Disclosure: I work for LINBIT, we’re the guys that develop DRBD. Allow me to address these alleged minuses here.
“DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.”
If the filesystem that sits on DRBD gets corrupted, it is correct that that corruption wouldn’t magically disappear on failover. However failback can’t “destroy” the master (i.e. cause more corruption than already exists).
“If the master panics, then after failover both fsck and transaction logs replay must be performed.”
fsck amounts to replaying a journal and usually gets completed within under a second. Unless you use a non-journaling filesystem, which is a really bad idea in the first place. And while DRBD would panic the host deliberately in some error conditions in obsolete versions, it doesn’t do so anymore (since DRBD 8, which was released over a year ago).
“NIC and network corruption is also propagated.”
Wrong. End-to-end replication integrity checking was introduced to prevent exactly that. This has been around for half a year. See http://www.drbd.org/users-guide/s-integrity-check.html
“Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition”
Correct, but nothing stops you from running two DRBD-backed database instances on two hosts in a “criss-cross” fashion, converging on one node on node failure.
“Could generate a lot of network traffic.”
Which is why it’s always recommended to use a dedicated crossover replication link.
“2 heartbeats needed on a reliable, local network.”
2 network connections are always available if you follow the recommendation above. Adding another heartbeat communication path amounts to adding one line in a config file.
And allow me to add two more pluses here for DRBD, just because a customer mentioned them in one of the DRBD sessions at the conference:
1. No matter on which host a DRBD-backed MySQL server runs, it listens on a virtual, Heartbeat-managed IP address. Thus if you run 1:n replication off that master, failing over doesn’t affect replication to your slaves at all. It simply continues right where it left off after failover and manual switchover.
2. DRBD is synchronous. Can’t do that with MySQL replication.
[...] and MySQL: Just Say Yes I recently came across this blog post with the catchy title of “DRBD and MySQL: Just Say No”. Now while I have absolutely no issue with people not liking DRBD or finding that it doesn’t [...]
MySQL and DRBD, Just say NO !…
Florian
is replying to Janmes on the subject of using DRBD for MySQL HA. Florian is refuting most of the arguments that James has against using MySQL and DRBD together.
I`m also saying NO to MySQL and DRBD in most of the cases.. but not for any of…
I guess by FSCK you mean reply if file system transactional logs… which still takes some time however.
I think DRBD mainly applies to the people who mainly got use to SAN based clustering and active-passive clustering offered by many databases. It is generic and this is what some groups of people love.