Last weekend we went live with a change where we split one of our central user databases off into a master-master replication pair, with HA and a virtual IP/hostname to point our apps to. It had been previously hosted on one of the shards, so it was time to give it a life of its own as well as some redundancy.
On Friday we did a test of the change in our staging environment.
After setting up the master-master pair, the first thing we verified was writes/table creates, etc. were flowing in both directions for the database we planned to replicate between the two. Lets just call it "db1".
To enable this I had used the option: replicate-do-db=db1 in the my.cnf on both DB machines.
After we verified this, I was told that we should probably also replicate "db2". I edited the config on both machines and changed the option to be replicate-db-db=db1,db2
We then proceeded to take a snapshot of our staging environment for these dbs so that we could test failover, app performance, etc.
We restored the snapshot, brought up the app and everything looked fine. We tested failovers. All looked good.
The change was pushed into production over the weekend and on Sunday night a network glitch caused the HA to failover to make the second DB in the pair active. Then late Monday evening someone noticed that changes didn't actually seem to be replicating between the two, although SHOW SLAVE STATUS reported IO and SQL threads running and everything was caught up.
Enter the culprit - my replicate-do-db setting
The correct way to tell MySQL to enable replication for select DBs is to issue the parameter multiple times:
replicate-do-db=db1
replicate-do-db=db2
The configuration as I had it was actually telling MySQL to filter and only replicate a database called "db1,db2" which of course did not exist.
Despite the fact we thought that we'd been good at testing, this managed to slip through.
I guess even someone who has been using MySQL for almost 10 years can make silly mistakes. Whoops!
Here's hoping that someone else out there can learn from my mistake!
Lachlan
Those options cause a lot of problems for many people. I find it is far simpler to leave them out and always replicate everything.
ReplyDeleteHindsight is always 20/20 ;-)
ReplyDeleteI concur. using any of those options at all causes lots of grief. goes for replicate-* and binlog-*. nasty side-effects all round.
ReplyDeleteI can't believe this isn't fixed yet!
ReplyDeletehttp://bugs.mysql.com/bug.php?id=34332
(replicate-rewrite-db doesn't work with single character database names)
Also note that replicate-do-db only works for commands whose default db is specified. So, if you do this:
ReplyDeleteuse mysql;
update db1.foo set x=y WHERE....
That won't be replicated. The safer way to replicate is to use
replicate-wild-do-table=db1.%
replicate-wild-do-table=db2.%
And that way everything you want gets replicated, even if you run the query from a non-replicated database.
@Sheeri - Yeah.. That was a "gotcha" I was well aware of, but definitely one to keep in mind when using those options.
ReplyDeleteAlso replicate-do-db will replicate statements that may apply changes to other databases provided that the specified db was the default. This can happen with cross-database queries.
The list of caveats with these replicate-* and binlog-* options is a long one! :)