This week I have been spending some time adding support for throttling to XtraBackup Manager as it has been considered a pre-requisite for us using the tool against our production databases.
In order to add support for throttling, the first thing I did was to look into what kind of means are available to throttle.
It seems there are two methods, both of which are mentioned in Percona's docs or blogs.
#1. Use the --throttle=N parameter. You can give this to innobackupex or to xtrabackup directly. According to the documentation this will limit xtrabackup to use N IOPs/sec when running in --backup mode.
For local machine backups this means N total read/write IOPS/sec and for incrementals this simply means N read IOPS/sec. When using streaming mode --throttle does not take effect (see #2).
#2. Use a nifty tool called "pv" (Pipe Viewer). It has a few features, but most notably it can be use as a simple rate limiter in your pipeline. An example:
shell> cat myFile | pv -q -L10m > myFileCopy
The above will limit the speed at which the file is "cat" into myFileCopy to 10 megabytes a second. Assuming of course the IO subsystem can reach at least that speed.
The best application for pv is to place it somewhere in the pipeline of your streaming backups to limit the rate at which things can flow through the pipeline.
shell> innobackupex --stream
The above will stream through pv and limit the maximum throughput to 10 megabytes/second.
So now understanding what rate limiting methods are available, I needed to consider in what ways XtraBackup Manager uses XtraBackup and the best way to implement the throttling.
I know that:
a) XtraBackup Manager always uses streaming mode when it takes a full backup, so the only option to use there is #2, pv.
b) When performing an incremental backup, XtraBackup Manager will always have xtrabackup stage the deltas locally, before using netcat (nc) to shuttle the data back over the network to the backup host for storage. In this case, limiting using pv is not really useful, because xtrabackup is going to chew up as much IO as it can while calculating the deltas, so we need to opt for the --throttle option on xtrabackup.
So once I understood that I'll need to actually implement throttling in two ways in XtraBackup Manager, I thought about how I would present it to the user for configuration.
I personally find it a bit annoying and confusing that I have to think in two units of measurement for different situations, so I wanted to see if I could insulate the user from that.
My aim was to see if I could present the user with a single configurable for the throttle on a backup task. After all, you don't care what type of backup is going on, you just want to say "Don't use more IO than this much…".
So in order to achieve this, I needed to understand the relationship between the two options as well as the characteristics of IO in both cases.
From my understanding, if you are taking a full backup, you are simply streaming each file sequentially - so we are talking about sequential reads here.
If we are talking about incrementals, we basically give xtrabackup a log sequence number and say "check all the pages and copy ones with a log sequence number above the one we gave" -- so we're finding the pages that have been changed since the given log sequence number.
In this case, it should also be a sequential read, as we're scanning pages end to end, and just checking the log sequence number.
So in both cases it seems we're talking about sequential reads.
When using pv, we're already dealing in a term that is easy to understand and fairly non-subjective. A rate limit in megabytes/sec of sequential read is straight forward.
Now when we're dealing with the --throttle option and thinking in IOPS we have some more to think about. Firstly, how big is an IOP?
Since I'm no good at reading C source code, I opted for the black box method of investigation and simply took an idle database server and started running xtrabackup against it with various --throttle values, while watching iostat on the data mount.
Here are some results:
Throttle value vs Observed disk throughput MB/sec
Interestingly the pattern I observe is: throughput = N+2
My best interpretation after even attempting a little digging into xtrabackup.c is that on this idle system we are limiting xtrabackup to 1 x 1MB IOP per second to scan the InnoDB data files, plus we burn 2MB per second to scan/copy the InnoDB log so that it can be applied later.
Now the catch 22 in this whole thing is that I'm observing this on an idle system, so this 2MB per second of log IO would increase if there is more log activity -- surely on a busy system you would need to read more than 2MB of logs every second to keep up.
The catch part? If I actually make the system busy, I can no longer determine where all the different IO in iostat is coming from, so I can't determine how much IO xtrabackup is now using. I'm sure there is a better way to instrument that per process, but unfortunately it extends beyond my personal skill set right now.
In blogging this, I'm hoping someone reading this can help with ideas or clarification...
So coming back to how I should implement the throttling -- I'm fairly sure that IOPS are 1MB in xtrabackup and pv also allows me to throttle in MB/sec, so I should be able to give one simple "throttle" configurable to the XtraBackup Manager user and tell them it limits in MB/sec.
The question then becomes, should I adjust the value I pass to --throttle for XtraBackup to account for this "at least 2MB used for log scanning"?
I decided I wanted to try to be clever and go ahead and adjust it -- so the value passed to XtraBackup for --throttle is now adjusted -2. If the adjustment gives a throttle value less than 1, it is simply given as 1.
None of this is set in stone -- I'm still testing and experimenting, but I'm curious to know your thoughts.
Can anyone shed light on what xtrabackup is doing ?
Should I bother adjusting this value or not ?