Friday, July 12, 2013

MySQL Database Provisioning Automation @ Facebook

An article I wrote was posted to the Facebook Engineering blog, about the automation system I worked on at Facebook for MySQL Database Provisioning.

It covers, in fairly intimate detail, a system called "Windex" that we use to provision and re-provision our MySQL databases at Facebook. This system basically provisioned the new Facebook Datacenter in LuleƄ, Sweden, with very little human effort, saving us loads of time.

So, if you're curious about some of what it is that has been taking up all my time for the last year and some, or if you're just always curious about how Facebook is doing things, go check it out.

Tuesday, April 17, 2012

Support for XtraBackup 2.0 in XtraBackup Manager coming soon...

Hi Folks,

Just a quick note to let you know that I am planning to add support to XtraBackup Manager to work with XtraBackup 2.0 series releases fairly soon.

Using the XtraBackup 2.0 series will mean that XtraBackup Manager will no longer need to stage the incremental backups to a location on the remote host before copying them back to the XtraBackup Manager storage.

This can be a remarkable efficiency saving for systems with a lot of page changes between backups.

I will also be trying to address some of the feedback/requests that I have received in the Google Code Issues section.

Please check out the project in Google Code here, if you have not already. Feedback and contributions are welcomed!

http://code.google.com/p/xtrabackup-manager/

Cheers,
Lachlan

Monday, April 9, 2012

Talking At MySQL Conference

Hi Everyone!

Just a reminder to all of those who are attending the MySQL Conference in Santa Clara this week that I'll be presenting a session all about XtraBackup Manager.

My session will be entitled "Introducing XtraBackup Manager" and happens on 11 April 15:30-16:20 @ Ballroom D.

If you are interested in learning more about XtraBackup Manager, or would just like to come support me - I look forward to seeing you there!

Cheers,
Lachlan

Tuesday, February 7, 2012

XtraBackup Manager - Job Control, Better Debugging and some little fixes...

Hi Everyone,

Just a quick note to let you know that I've just finished up adding some new features to XtraBackup Manager.

You can now get better visibility into what is going on inside XtraBackup Manager with the "xbm status" command.

It will allow you to see which backup jobs are running and also those which may be waiting to start, due to the maximum number of concurrent backup tasks already running.

It looks/works as follows:

[xbm@localhost ~]$ xbm status

XtraBackup Manager v0.8 - Copyright 2011-2012 Marin Software

Currently Running Backups:

+--------+-----------+-------------+---------------------+-------------------+------+
| Job ID |   Host    | Backup Name |     Start Time      |      Status       | PID  |
+--------+-----------+-------------+---------------------+-------------------+------+
| 14     | localhost | test-backup | 2012-02-07 14:19:19 | Performing Backup | 2525 |
+--------+-----------+-------------+---------------------+-------------------+------+

Note: I have to thank a tiny little BSD-licensed project I found on Google Code called PHP text table for saving me the need to reinvent the wheel in providing this very mysql command-line client-styled table output.


In addition to seeing which jobs are running/queued, if there is a backup job you would like to abort for some reason, then you can now simply use the "xbm kill" command with a Job ID taken from the "xbm status" output:

[xbm@localhost ~]$ xbm kill 14

XtraBackup Manager v0.8 - Copyright 2011-2012 Marin Software

Action: Backup Job ID 14 was killed.

The backup job itself will log an event at the ERROR level, like:

2012-02-07 14:19:30 -0800 [ERROR] : [ The backup job was killed by an administrator. Aborting... ]
2012-02-07 14:19:30 -0800 [INFO] : [ Cleaning up files... ]
2012-02-07 14:19:30 -0800 [INFO] : [ Released lock on port 10000. ]
2012-02-07 14:19:31 -0800 [ERROR] : [ Exiting after the backup job was killed... ]

I'm still not 100% on whether an aborted backup message should be considered an "Error" level event or an "Info" level event. My thinking is that I'd prefer to know if a job was aborted, so I figure putting it at the ERROR level will ensure it is always logged.

Now speaking quickly of the log levels -- it is now useful to increase your logging level in config.php from INFO to DEBUG.

You will see the exact commands used for running backups by XtraBackup Manager, which can be useful when troubleshooting XBM-related issues.

It will enable logging like the below -- Note: The password is _actually_ masked when writing the command to the log. You're welcome ;-)

2012-02-07 14:19:19 -0800 [INFO] : [ Staging an INCREMENTAL xtrabackup snapshot of /var/lib/mysql via ssh: mysql@localhost to /tmp/xbm-3592510/deltas... ]
2012-02-07 14:19:19 -0800 [DEBUG] : [ Attempting to run the incremental backup with command:
ssh -o StrictHostKeyChecking=no -p 22 mysql@localhost 'cd /tmp/xbm-3592510 ; innobackupex --ibbackup=xtrabackup --slave-info --incremental-lsn=2325647 /tmp/xbm-3592510/deltas --user=root --safe-slave-backup  --password=XXXXXXX --no-timestamp --incremental --throttle=0 1>&2 ' 
 ]

Aside from the above, some other small fixes were made, including ensuring that all temporary files created on the database host that you're backing up are created in the defined "staging_tmpdir" -- This is a parameter that is set at the host level in XtraBackup Manager.

eg. shell> xbm host edit hostname staging_tmpdir /path/to/use

I encourage you to check out the XtraBackup Manager Project and open issues with any problems you encounter or other feedback.

Cheers,
Lachlan

Tuesday, January 24, 2012

I'm speaking at the MySQL Conference And Expo 2012!

Percona Live MySQL User's Conference, San Francisco, April 10-12th, 2012Hello Everyone,

I'm very pleased to announce that my submission to talk at the Mysql Conference And Expo 2012 has been accepted! I'll be giving a talk entitled "Introducing XtraBackup Manager", which, as the title suggests, will serve as an introduction to XtraBackup Manager.

I'll be covering what it is, how it works and its features as well as reserving some time for Q+A.

If you are interested in learning more about this tool and plan to attend the conference, this will be a great way to get started!

I hope to see some of you there in April!

For more info on the conference, click here.

Cheers,
Lachlan

Thursday, January 5, 2012

XtraBackup Manager Pre-Release v0.8 - Try it out today!

Aloha Everybody!

I'm happy to announce XtraBackup Manager Pre-Release v0.8!

Now that XtraBackup 1.6.4 is released and I have completed some of my final show-stopper bug fixes, I feel that XtraBackup Manager is now in a state ready for more general consumption.

I have yet to package up tarballs, but the Quick Start Guide in the Project Wiki contains all the steps you should need to get up and running from the svn trunk.

There is also some great detailed documentation, including diagrams of all of the different Backup Strategies here.

So please, check out the Project and take it for a spin -- if you have problems or questions, join the discussion on the XtraBackup Manager Google Group!

Thanks and Happy 2012!!

Note: Release notes for XtraBackup v0.8 can be found here.

Lachlan

Friday, December 2, 2011

XtraBackup Manager - XtraBackup Throttling

Hello again!

This week I have been spending some time adding support for throttling to XtraBackup Manager as it has been considered a pre-requisite for us using the tool against our production databases.

In order to add support for throttling, the first thing I did was to look into what kind of means are available to throttle.

It seems there are two methods, both of which are mentioned in Percona's docs or blogs.

#1. Use the --throttle=N parameter. You can give this to innobackupex or to xtrabackup directly. According to the documentation this will limit xtrabackup to use N IOPs/sec when running in --backup mode.

For local machine backups this means N total read/write IOPS/sec and for incrementals this simply means N read IOPS/sec. When using streaming mode --throttle does not take effect (see #2).

#2. Use a nifty tool called "pv" (Pipe Viewer). It has a few features, but most notably it can be use as a simple rate limiter in your pipeline. An example:

shell> cat myFile | pv -q -L10m > myFileCopy

The above will limit the speed at which the file is "cat" into myFileCopy to 10 megabytes a second. Assuming of course the IO subsystem can reach at least that speed.

The best application for pv is to place it somewhere in the pipeline of your streaming backups to limit the rate at which things can flow through the pipeline.

Eg.

shell> innobackupex --stream | pv -q -L10m | nc targetHost 10000

The above will stream through pv and limit the maximum throughput to 10 megabytes/second.

So now understanding what rate limiting methods are available, I needed to consider in what ways XtraBackup Manager uses XtraBackup and the best way to implement the throttling.

I know that:

a) XtraBackup Manager always uses streaming mode when it takes a full backup, so the only option to use there is #2, pv.

b) When performing an incremental backup, XtraBackup Manager will always have xtrabackup stage the deltas locally, before using netcat (nc) to shuttle the data back over the network to the backup host for storage. In this case, limiting using pv is not really useful, because xtrabackup is going to chew up as much IO as it can while calculating the deltas, so we need to opt for the --throttle option on xtrabackup.

So once I understood that I'll need to actually implement throttling in two ways in XtraBackup Manager, I thought about how I would present it to the user for configuration.

I personally find it a bit annoying and confusing that I have to think in two units of measurement for different situations, so I wanted to see if I could insulate the user from that.

My aim was to see if I could present the user with a single configurable for the throttle on a backup task. After all, you don't care what type of backup is going on, you just want to say "Don't use more IO than this much…".

So in order to achieve this, I needed to understand the relationship between the two options as well as the characteristics of IO in both cases.

From my understanding, if you are taking a full backup, you are simply streaming each file sequentially - so we are talking about sequential reads here.

If we are talking about incrementals, we basically give xtrabackup a log sequence number and say "check all the pages and copy ones with a log sequence number above the one we gave" -- so we're finding the pages that have been changed since the given log sequence number.

In this case, it should also be a sequential read, as we're scanning pages end to end, and just checking the log sequence number.

So in both cases it seems we're talking about sequential reads.

When using pv, we're already dealing in a term that is easy to understand and fairly non-subjective. A rate limit in megabytes/sec of sequential read is straight forward.

Now when we're dealing with the --throttle option and thinking in IOPS we have some more to think about. Firstly, how big is an IOP?

Since I'm no good at reading C source code, I opted for the black box method of investigation and simply took an idle database server and started running xtrabackup against it with various --throttle values, while watching iostat on the data mount.

Here are some results:

Throttle value vs Observed disk throughput MB/sec

1:3 MB/sec
2:4 MB/sec
3:5 MB/sec
4:6 MB/sec
5:7 MB/sec

Interestingly the pattern I observe is: throughput = N+2

My best interpretation after even attempting a little digging into xtrabackup.c is that on this idle system we are limiting xtrabackup to 1 x 1MB IOP per second to scan the InnoDB data files, plus we burn 2MB per second to scan/copy the InnoDB log so that it can be applied later.

Now the catch 22 in this whole thing is that I'm observing this on an idle system, so this 2MB per second of log IO would increase if there is more log activity -- surely on a busy system you would need to read more than 2MB of logs every second to keep up.

The catch part? If I actually make the system busy, I can no longer determine where all the different IO in iostat is coming from, so I can't determine how much IO xtrabackup is now using. I'm sure there is a better way to instrument that per process, but unfortunately it extends beyond my personal skill set right now.

In blogging this, I'm hoping someone reading this can help with ideas or clarification...

So coming back to how I should implement the throttling -- I'm fairly sure that IOPS are 1MB in xtrabackup and pv also allows me to throttle in MB/sec, so I should be able to give one simple "throttle" configurable to the XtraBackup Manager user and tell them it limits in MB/sec.

The question then becomes, should I adjust the value I pass to --throttle for XtraBackup to account for this "at least 2MB used for log scanning"?

I decided I wanted to try to be clever and go ahead and adjust it -- so the value passed to XtraBackup for --throttle is now adjusted -2. If the adjustment gives a throttle value less than 1, it is simply given as 1.

None of this is set in stone -- I'm still testing and experimenting, but I'm curious to know your thoughts.

Can anyone shed light on what xtrabackup is doing ?

Should I bother adjusting this value or not ?

Cheers,
Lachlan