Just investigating ways to detect and fix replication errors on a daily basis – without reloading the slave. The database I am managing is large, but fortunately partitioned into lots of smaller, independent tables.
The most common error this year is malformed packets as the master and slave are in different data centers. Skipping that statement is often ok, with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1.
I saw 2 talks by Baron Schwartz this year at MySQLCamp and the MySQL Conference, so I thought I’d look into his work.
I’m able to run his mysql-table-checksum-1.1.5 in ACCUM and BIT_XOR modes, but not CHECKSUM. I had to do some editing on the script, so it looks like it needs a little more testing.
Update: He’s fixed the ACCUM and BIT_XOR bind bugs.
Baron Schwartz’s work:
xaprb: Introducing MySQL Table Checksum
Sourceforge: MySQL Toolkit
Innotop
MySQL manual:
Replication Startup Options
SQL Statements for Controlling Slave Servers
SET GLOBAL SQL_SLAVE_SKIP_COUNTER
SHOW SLAVE HOSTS Syntax
CHECKSUM TABLE Syntax