Co-author: Robert Boros
The idea behind the zero-downtime migration is to setup a replication between the 5.7 and 8.0. Initially load the necessary data with xtrabackup tool and start the slave with the correct binlog information.
This post will use the binlog-based replication. You can find more details about how this works under the link. Briefly the source sends all of the binlog event the replication node which subscribed for that.
- Prepare the clusters (remove the SSL encryption from the 8.0).
- Create xtrabackup on the source cluster (this will be the 5.7), and zip it.
- Move the zipped backup to the replication cluster’s node (this will be the 8.0 node 1) and apply the changes.
- Setup the replication on the replication cluster’s node.
- Once all of the data is in place or replicated properly change to the 8.0.
1 — Prepare, create the replication user
The first step is to create a suitable user for replication. I didn’t deal with the security of that so the host specification of the user is %. It should also have REPLICATION SLAVE permissions.
CREATE USER 'repl'@'%' IDENTIFIED BY 's3cure_password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
Note about the SSL on the slave cluster
Our first thought was that the SSL encryption which is ON by default on the MySQL 8.0 will cause some errors. So we disabled it. All of the changes were made without the SSL encryption, but I didn’t want to say that it’s a must. So I added SSL encryption, then I restarted the whole slave cluster and checked the slave status. It didn’t show me any error, so I created some data, and the replication worked just fine.
The server_id also has an important part, because in the replication topology all of the participants should have a unique server_id. Note, in this case the cluster nodes are not participants, only the master and slave nodes are.
The server_id should be 1 or higher, otherwise the source will refuse all of the connections from the replication nodes.
2 — xtrabackup on the master
Since the replication is not copying all of the data we should create an xtrabackup on the master database.
xtrabackup --backup --user=root --password=toor --target-dir=~/new
This will output lots of status reports that the program doing. I highlighted the most important parts.
...201118 15:40:37 >> log scanned up to (3537995502)
xtrabackup: Generating a list of tablespaces InnoDB: Allocated tablespace ID 2 for mysql/plugin, old maximum was 0 201118 15:40:38  Copying ./ibdata1 to /root/new/ibdata1 201118 15:40:38 >> log scanned up to (3537995502) 201118 15:40:38  ...done 201118 15:40:38  Copying ./mysql/plugin.ibd to /root/new/mysql/plugin.ibd201118 15:40:38  ...done 201118 15:40:38  Copying ./mysql/servers.ibd to /root/new/mysql/servers.ibd...201118 15:41:19 Finished backing up non-InnoDB tables and files
201118 15:41:19 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
xtrabackup: The latest check point (for incremental): '3537996875'
xtrabackup: Stopping log copying thread.
.201118 15:41:19 >> log scanned up to (3537996884)201118 15:41:20 Executing UNLOCK TABLES
201118 15:41:20 All tables unlocked
201118 15:41:20  Copying ib_buffer_pool to /root/new/ib_buffer_pool
201118 15:41:20  ...done
201118 15:41:20 Backup created in directory '/root/new/'
201118 15:41:20  Writing /root/new/backup-my.cnf
201118 15:41:20  ...done
201118 15:41:20  Writing /root/new/xtrabackup_info
201118 15:41:20  ...done
xtrabackup: Transaction log of lsn (3537995493) to (3537996884) was copied.
201118 15:41:20 completed OK!
It’s creating copies about the tablespaces and having a kind of file system copy of the databases.
The “UNLOCK TABLES” can be misleading, because it means that the tables were locked. But I tried it and I could wrote new rows into the table. There is a table with more than 3 million rows. So it takes some time while the xtrabackup creates the copy of it and in the middle of the copy I could insert without any problems.
The next step is to prepare the backup. This will make the backup consistent in time. So there won’t be any corruption, or crash because of data inconsistency. But this is important for a totally different reason. When we set up the replication, we have to provide binary log information. The preparation will obtain the backup binlog information, so we can set the slave up properly.
xtrabackup --prepare --user=root --password=toor --target-dir=~/newcat xtrabackup_binlog_info
// out: percona-57-node1-bin.000009 3888
The last part of this section is creating a zip of the xtrabackup’s folder.
zip -r data.zip ~/new
Once I have this, I can move it to the replication node (8.0 node1).
3 — Apply the backup
Note: If you have a Percona 8.0 installation on that node it is possible, that the version of the xtrabackup will be higher then the 5.7 installation. So you have to install the xtrabackup 2.4 instead of the newer one. It’s important, becase the backup created with the 2.4 version.
sudo yum install percona-xtrabackup-24
I usually use scp to move files, since the nodes are all in a private network I can do that easily. Unzip it and use it.
Before doing anything I move all of the files under the /var/lib/mysql directory to a /var/lib/mysql-save backup. mv /var/lib/mysql /var/lib/mysql-save and then I can apply the changes.
systemctl stop mysql
xtrabackup --move-back --target-dir=/home/user/location/of/unzipped
chown mysql:mysql -R /var/lib/mysql
Before we can restart the mysql we have to move back the socket file under the /var/lib/mysql because otherwise we can’t connect to the mysql.
mv /var/lib/mysql-save/mysql.sock /var/lib/mysql/
systemctl start mysql
Troubleshoot: In the best case scenario it will start without any error. If any error occurs or the other server stops unexpectedly we have to bootstrap it again. But for this we have to do some changes. If there is a file called /var/lib/mysql/grastate.dat should be modified. safe_to_bootstrap should be 1 instead of 0. After that we can bootstrap the cluster again.
4 — Setup replication
The MySQL started so we can setup the replication.
CHANGE MASTER TO
We know everything expect the MASTER_LOG_FILE and MASTER_LOG_POS. Fortunately we prepared the backup, so we can find this information in the /var/lib/mysql/xtrabackup_binlog_info file. So if we cat it out, we can add them and the slave will start properly.
Once the slave has started properly it will ask the remaining data from the master what should be replicated since the xtrabackup created and copied it. Since it’s comparing it with the replication’s relay log, there won’t be row duplications.
5 — Move to 8.0
This assumes that all of the data was properly replicated. It’s really easy to validate: you have to SELECT COUNT(*) on a table which is getting data frequently, and then do the same on the replication side. The numbers should match (or get something similar) and the replication should increase properly in sync with the master.
Once it’s done, we only have to point the backend logic to the new cluster (8.0) and remove the replication logic.
RESET SLAVE ALL;
This will disable the replication information what we set in the CHANGE MASTER TO… section, so the new cluster is the master cluster now.