The question:
I’m tasked with starting a brand new replica of a client’s MySQL 5.7.35 database server. While I’ve observed others doing this in the past in various circumstances, I haven’t been in charge of this operation before, so I’d just like to confirm that my plan is correct and that I’m not missing anything crucial.
The data lives in an XFS-formatted LVM logical volume, inside a volume group that has enough free space to enable taking an LVM snapshot.
The database is a legacy mixture of MyISAM and InnoDB tables, over 200 GB in size at the moment. The replica server is set up with an identical MySQL version, and all of the prerequisites for row-based binlog replication (replica user, server ID, log_bin
etc.) are in order.
My process is this:
- On the master, run
FLUSH TABLES WITH READ LOCK
. Leave the client session open.- According to Percona, it’s important to try to prevent long-running
SELECT
queries from being in the middle of execution when this is run, so we’ll make an effort to ensure this.
- According to Percona, it’s important to try to prevent long-running
- In another client session (don’t know why, but the docs say so), run
SHOW MASTER STATUS
to get the binlog file name and position. - Create the LVM snapshot from the MySQL data volume. (Does not contain binlogs or relay logs.)
- In the original MySQL client session, run
UNLOCK TABLES
to restore normal operation. - Mount the snapshot (
-t xfs -o nouuid
) andrsync
the data directory from it onto the (shut down) replica server – excludingauto.cnf
and the autogenerated.pem
files (we’re not using SSL to connect to MySQL, so they shouldn’t particularly matter). - Unmount and
lvremove
the snapshot. - Start up MySQL on the replica and do the normal binlog replication initialization steps as outlined in the MySQL documentation, providing the
MASTER_LOG_FILE
andMASTER_LOG_POS
options toCHANGE MASTER
with the values from step 2, and then just wait for the replica to catch up.
I’ve tested this process to work on a trafficless staging server, but my main concern is FLUSH TABLES WITH READ LOCK
on the actual production instance. Is it sufficient to prevent long-running SELECT
s or is there some other danger I need to be aware of? Any clients attempting writes will naturally fail while the lock is in place, but the goal is to get steps 1–4 finished in under 30 seconds, which is an acceptable partial outage for us.
Also, since we’re running mixed MyISAM and InnoDB, is FLUSH TABLES WITH READ LOCK
all we need to do, or does one of the engines require something more to ensure a consistent LVM snapshot?
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
FLUSH TABLES WITH READ LOCK is sufficient. This locks the tables, preventing new writes, and it also makes sure any other writes to tables are flushed to the disk volume for MyISAM tables. For InnoDB tables, some changes may still be in the buffer pool, but those change are also accounted for by the InnoDB redo log, so as you start up the replica those changes will be restored automatically (this is the same as InnoDB crash recovery).
Using SHOW MASTER STATUS during the read lock is necessary because you need those coordinates for your CHANGE MASTER command on the replica. It’s basically analogous to a bookmark, so the replica knows where to start reading in the binary log.
It is still important to listen to the advice about long-running queries blocking FLUSH TABLES WITH READ LOCK. Any query, even a read-only SELECT, holds a metadata lock on a table. But FTWRL requires no metadata locks held on any tables, at least briefly. So a long-running SELECT will block FTWRL. You can do an experiment to demonstrate this on a test instance of MySQL:
In one window, create a MyISAM table:
mysql> create table test.m (i int) engine=myisam;
mysql> insert into test.m values (42);
Query it in a way that will last some time:
mysql> select sleep(120) from m;
(hangs)
In a second window, try FTWRL:
mysql> flush tables with read lock;
(hangs)
The FTWRL is waiting for that query to finish, so FTWRL can get its own turn at the metadata lock. If that query takes a long time to finish, FTWRL will still be waiting, and also any other queries will be queued up waiting for FTWRL to release its metadata locks on all tables.
Percona added a feature to their version of MySQL to help in these situations. It’s a lock used during backups which does not suffer this blocking behavior. Read https://www.percona.com/blog/2014/03/11/introducing-backup-locks-percona-server-2/ for details if you’re interested, but if you use mainstream MySQL, you don’t have this feature.
The LVM snapshot method is a pretty good solution given that you have a mix of InnoDB and MyISAM tables, because the lock can be done pretty briefly, just long enough to acquire the LVM snapshot and read the MASTER STATUS.
I’m accustomed to using Percona XtraBackup, which does not require locking, as long as the database stores only InnoDB tables (except for the mysql system tables, which are very small, so they are not a problem to back up during a brief read lock). It’s my preference to insist that all tables are stored in the InnoDB. MyISAM has poor performance and is susceptible to data corruption.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0