Data Migration #

In a scenario where the business continues to develop and the amount of data and concurrency reaches a certain extent, the traditional single database may face problems in terms of performance, scalability, and availability.

Although NoSQL solutions can solve the above problems through data sharding and horizontal scale-out, NoSQL databases generally do not support transactions and SQL.

SphereEx-DBPlusEngine supports data sharding for traditional standalone databases and also supports distributed transactions and SQL, which can solve distributed cluster problems.

Overview #

SphereEx-DBPlusEngine’s data migration plugin implements the split migration of homogeneous databases from monolithic to distributed systems. This migration can be done online and supports data comparison between the source and target sides. SphereEx-DBPlusEngine’s database migration reduces the impact on business operations during data migration, providing a one-stop, generic data migration solution.

Currently, the SphereEx-DBPlusEngine data migration plugin supports MySQL, PostgreSQL, and openGauss database products with the following versions, environment requirements, and permission requirements.

	Database	Supported Version	Environment Requirement	Permission Requirement
1	MySQL	5.1.15 ~ 8.x	`my.cnf` Configuration log-bin=mysql-bin binlog-format=row binlog-row-image=full –Confirm the following commends show variables like ‘%log_bin%’; show variables like ‘%binlog%’;	GRANT REPLICATION SLAVE, REPLICATION CLIENT ON . TO ${username}@${host} –Confirm the following commends SHOW GRANTS FOR ‘user’;
2	Aurora MySQL	3.02.2	`my.cnf` Configuration log-bin=mysql-bin binlog-format=row binlog-row-image=full –Confirm the following commends show variables like ‘%log_bin%’; show variables like ‘%binlog%’;	GRANT REPLICATION SLAVE, REPLICATION CLIENT ON . TO ${username}@${host} –Confirm the following commends SHOW GRANTS FOR ‘user’;
3	PostgreSQL	Version 9.4 or higher	`postgresql.conf` Configuration: wal_level = logical max_wal_senders = 10 max_replication_slots = 10 max_connections = 600	`pg_hba.conf` Configuration: host replication repl_acct 0.0.0.0/0 md5
4	openGauss	2.0.1 ~ 3.0.0	`postgresql.conf` Configuration: wal_level = logical max_wal_senders = 10 max_replication_slots = 10 max_connections = 600	`pg_hba.conf` Configuration: host replication repl_acct 0.0.0.0/0 md5
5	MariaDB	5.1 and higher	`my.cnf` configuration: log-bin=mysql-bin binlog-format=row binlog-row-image=full – Confirm using the following commands: show variables like ‘%log_bin%’; show variables like ‘%binlog%’;	GRANT REPLICATION SLAVE, REPLICATION CLIENT ON . TO ${username}@${host} – Confirm using the following command: SHOW GRANTS FOR ‘user’;

During job execution, SphereEx-DBPlusEngine is able to schedule jobs according to the cluster’s resource situation and improve overall resource utilization and avoid resource contention, thus ensuring job execution efficiency. It also supports the automatic failover of tasks, providing the necessary protection for long-running tasks.

Basic Concept #

Nodes
Instances for running compute or storage tier component processes. These can either be physical machines, virtual machines, containers, etc.
Cluster
Multiple nodes are assembled together to provide a specified service.
Source

The storage cluster where the original data resides.

Target

The target storage cluster to which the original data is to be migrated.

Data Migration Job

The entire process of replicating data from one storage cluster to another.

Stock Data

The data was already in the data node before the data migration operation started.

Incremental Data

New data is generated by operational systems during the execution of data migration operations.

Application Scenario #

Scenarios where monolithic database storage capacity or connection count is insufficient to meet the scenario of migrating and splitting a homogeneous database from a monolithic to a sharded environment.

For example, let’s consider a traditional monolithic database in an application system that has over 100 million records in a single table and is still growing rapidly. The load on the monolithic database is constantly at a high level and is becoming a bottleneck in the system, so the database needs to be migrated and split.

	Source -> Target	Full volume migration	Incremental migration	Data verification
1	MySQL -> MySQL	Support	Support	Support
2	PostgreSQL -> PostgreSQL	Support	Support	Support
3	openGauss -> openGauss	Support	Support	Support
4	Oracle -> MySQL	Support	Unsupport	Support
5	Oracle -> PostgreSQL	Support	Unsupport	Support
6	Aurora -> Aurora	Support	Support	Support

Prerequisites for Use #

Supports tables with any index structure.
Supports accurate breakpoint resumable transmission for tables with multi-column primary keys or unique keys.
During migration, it can handle the migration of emoji data. MySQL needs to be configured with the utf8mb4 character set, and PostgreSQL needs to be configured with the utf-8 character set.

Usage Restrictions #

Supported items #

Migration of peripheral data to databases managed by Apache ShardingSphere.
Target proxy without rule or configure any rule.
Migration of single column primary key or unique key table, the first column type could be: integer data type, string data type and part of binary data type (e.g. MySQL VARBINARY).
Migration of multiple column primary keys or unique keys table.
Supports resumption of transmission from breakpoints.

Unsupported items #

Migration on top of the current storage node is not supported. A completely new database cluster needs to be prepared as the migration target database.
Target proxy table rule contains HINT strategy.
Use different target table schema from source table schema.
Changes to the source table structure during the migration process are not supported.

Notes #

In the phase of stock and incremental data migration, the transfer can be done online with no impact on the business. The traffic switching phase, on the other hand, requires a certain window of time to ensure that the data on the source and target sides have caught up, and this phase requires a brief cessation of operations.

Principle Description #

SphereEx-DBPlusEngine data migration is completed in 4 phases: preparation phase, stock data migration phase, incremental data synchronization phase, and traffic switching phase.

Preparation phase

In the preparation phase, the data migration module performs data source connectivity and permission checks, as well as stock data statistics and logging of logging loci, and finally slices the tasks according to the data volume and user-set parallelism.

Stock data migration phase

Execute the stock data migration tasks split in the preparation phase. The stock migration phase uses JDBC queries to read data directly from the source side and write it to the target side based on the configured sharding and other rules.

Incremental data synchronization phase

As the time taken for stock data migration is influenced by factors such as data volume and parallelism, it is necessary to synchronize the data added to the business during this time. The details of the technology used vary from database to database, but generally they are all based on replication protocols or WAL logs for changing data capture.

MySQL: subscribe and parse binlogs.
PostgreSQL: uses official logical replication of test_decoding.

The incremental data captured by these are also written to the new data node by the data migration module. When incremental data is basically synchronized (the incremental data is continuous as the business system is not stopped), the traffic switching phase is entered.

Traffic switching phase

During this phase, there may be a certain period of read-only window for the business, where the data in the source data node is allowed to be static for a short period of time to ensure that the incremental synchronization is fully completed, either through the stop-write function of SphereEx-DBPlusEngine or by setting the database to read-only or controlling the source write traffic.

The length of this read-only window depends on whether the user needs to perform consistency checks on the data and the volume of data. The consistency check is a stand-alone task and supports separate start/stop and breakpoint transfers.

Once validation is complete, the data migration is complete. The user can then switch the read traffic or write traffic to SphereEx-DBPlusEngine.

Usage Guidelines #

Develop a sharding strategy

Before data migration, you need to develop a reasonable sharding strategy in conjunction with your business. Please refer to the sharding plugins for details.

Optimizing data migration configuration (optional)

SphereEx-DBPlusEngine provides optimized configuration for data migration, which can control the threads for data extraction and data writing. In the case of abundant resources, resources can be used to the maximum, and migration operations can be completed efficiently. Migration traffic can also be limited to minimize the impact on business systems.

Data validation (optional)

To ensure data consistency, this can be confirmed by the data validation function, a process that is usually carried out during window times. Only one consistency check job is allowed to run at the same time, and the user can query, interrupt, restart, and delete the consistency checks.

Switching metadata

After confirming that the target environment is ready to go live, the cluster traffic switch can be triggered to complete the data migration.

Operation Guide #

The key operational steps for the data migration split are shown below. To ensure data consistency at both ends, a certain window of time is required before switching operations. Please refer to theData Migration DisSQL for specific operation commands.

Plan the migration, and confirm the database architecture and the number of shards on the target side.
Build the cluster in the Proxy and register the target database.
Configure the sharding rules in the Proxy.
Optimize the migration configuration (optional).
Register the source data source.
Start data migration.
Confirm migration status.
Stop the service.
Perform data validation.
Modify the business URL.
Resume operations after switching data sources.

Configuration Example #

This example is a demonstration of migrating a MySQL monolithic database to a 2 database 4 shard scenario via SphereEx-DBPlusEngine. Please refer to Data Migration DisSQL for specific commands.

Environment Description #

	Instance	IP Address	Service Port	Host Name
1	DBPlusEngine 1.3.0	192.168.xx.102	3307	dbplusengine
2	Zookeeper 3.6.3	192.168.xx.102	2181	dbplusengine
3	MySQL 5.7.26	192.168.xx.103	3306	mysql_0
4	MySQL 5.7.26	192.168.xx.104	3306	mysql_1
5	MySQL 5.7.26	192.168.xx.105	3306	mysql_2

Topology Diagram #

Configuration Process #

Prepare the source environment

Create the source database, create a table, and insert a number of test data.

copymysql -utest -p -h192.168.xx.106
mysql> CREATE DATABASE product;
mysql> SHOW DATABASES;
mysql> USE product;

mysql> CREATE TABLE `t_user` (
 `user_id` int NOT NULL,
 `order_id` int NOT NULL,
 `status` varchar(45) DEFAULT NULL,
 PRIMARY KEY (`user_id`)
);

mysql> INSERT INTO t_user VALUES
(1,1,'active'),
(2,2,'active'),
(3,3,'active'),
(4,4,'active');

mysql> SELECT * FROM t_user ORDER BY user_id;
+---------+----------+--------+
| user_id | order_id | status |
+---------+----------+--------+
|       1 |        1 | active |
|       2 |        2 | active |
|       3 |        3 | active |
|       4 |        4 | active |
+---------+----------+--------+
4 rows in set (0.00 sec)

Prepare the target environment

Create the product_1 and product_2 databases on the target side.

copymysql -utest -p -h192.168.xx.104
mysql> CREATE DATABASE product_1;
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| product_1          |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

mysql -utest -p -h192.168.xx.105
mysql> CREATE DATABASE product_2;
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| product_2          |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

Creating a logical database

Create the logical database sharding_db.

copy# mysql -uroot -p -P3307
mysql> CREATE DATABASE sharding_db;
mysql> SHOW DATABASES;

After registered the data source on the target side, create the sharding rules.

copymysql> USE sharding_db;

mysql> REGISTER STORAGE UNIT ds_1 (
     URL="jdbc:mysql://192.168.xx.104:3306/product_1?serverTimezone=UTC&useSSL=false",
     USER="test",
     PASSWORD="Test@123",
     PROPERTIES("minPoolSize"="1","maxPoolSize"="20","idleTimeout"="60000")
), ds_2 (
     URL="jdbc:mysql://192.168.xx.105:3306/product_2?serverTimezone=UTC&useSSL=false",
     USER="test",
     PASSWORD="Test@123",
     PROPERTIES("minPoolSize"="1","maxPoolSize"="20","idleTimeout"="60000")
);
Query OK, 0 rows affected (0.79 sec)

mysql> SHOW STORAGE UNITS\G

mysql> CREATE SHARDING TABLE RULE t_user(
 STORAGE_UNITS(ds_1,ds_2),
 SHARDING_COLUMN=user_id,
 TYPE(NAME="hash_mod",PROPERTIES("sharding-count"="4")),
 KEY_GENERATE_STRATEGY(COLUMN=user_id,TYPE(NAME="snowflake"))
);
Query OK, 0 rows affected (18.00 sec)

mysql> SHOW SHARDING TABLE RULE t_user;

Registering the Source Data Source

copymysql> REGISTER MIGRATION SOURCE STORAGE UNIT ds_0 (
     URL="jdbc:mysql://192.168.xx.103:3306/product?serverTimezone=UTC&useSSL=false",
     USER="test",
     PASSWORD="Test@123",
     PROPERTIES("minPoolSize"="1","maxPoolSize"="20","idleTimeout"="60000")
);
Query OK, 0 rows affected (0.15 sec)

Start the data migration

Once the source and target environments are prepared, you can start the data migration.

Before starting the migration, you can adjust the migration threads based on resource availability. Please refer to the migration job definition for guidance.

copymysql> MIGRATE TABLE ds_0.t_user INTO t_user;
Query OK, 0 rows affected (1.01 sec)

If you need to migrate multiple tables simultaneously, you can use commas to separate the names of multiple tables in the following format.

copyMIGRATE TABLE ds_0.t_user INTO sharding_db.t_user,ds_0.t_user1 INTO sharding_db.t_user1,ds_0.t_t_user2 INTO sharding_db.t_user2;

Once the migration has started, the migration list can be viewed via the SHOW command.

copymysql> SHOW MIGRATION LIST;
+---------------------------------------+--------+----------------------+--------+---------------------+-----------+
| id                                    | tables | sharding_total_count | active | create_time         | stop_time |
+---------------------------------------+--------+----------------------+--------+---------------------+-----------+
| j010171e761478594bdab280c55014650e5f6 | t_user | 1                    | true   | 2022-09-16 15:11:34 | NULL      |
+---------------------------------------+--------+----------------------+--------+---------------------+-----------+
1 row in set (0.75 sec)

Further view the status of the migration job based on the contents of the migration ID in the list.

copymysql> SHOW MIGRATION STATUS 'j010171e761478594bdab280c55014650e5f6';
+------+-------------+-----------+--------+-------------------------------+--------------------------+
| item | data_source | status    | active | inventory_finished_percentage | incremental_idle_seconds |
+------+-------------+-----------+--------+-------------------------------+--------------------------+
| 0    |             | PREPARING | true   | 0                             | 0                        |
+------+-------------+-----------+--------+-------------------------------+--------------------------+
1 row in set (0.16 sec)

mysql> SHOW MIGRATION STATUS 'j010171e761478594bdab280c55014650e5f6';
+------+-------------+--------------------------+--------+-------------------------------+--------------------------+
| item | data_source | status                   | active | inventory_finished_percentage | incremental_idle_seconds |
+------+-------------+--------------------------+--------+-------------------------------+--------------------------+
| 0    | ds_0        | EXECUTE_INCREMENTAL_TASK | true   | 100                           | 0                        |
+------+-------------+--------------------------+--------+-------------------------------+--------------------------+
1 row in set (0.06 sec)

When the status is EXECUTE_INCREMENTAL_TASK, the service can be stopped to perform a consistency check on the data.

stop the service to avoid new data being written to the source
Data Consistency Checking

SphereEx-DBPlusEngine can verify the data consistency between the source and the target, including the comparison of the number of records on both sides.

copymysql> CHECK MIGRATION 'j010171e761478594bdab280c55014650e5f6' BY TYPE (NAME='CRC32_MATCH');
Query OK, 0 rows affected (0.08 sec)

DATA_MATCH is required if data encryption or heterogeneous migration is enabled on the target side, check the data validation algorithm as follows.

copymysql> SHOW MIGRATION CHECK ALGORITHMS;

Algorithm description can be found inConsistency check algorithm list。

product_1

copymysql> USE product_1;
mysql> SHOW TABLES;
+---------------------+
| Tables_in_product_1 |
+---------------------+
| t_user_0            |
| t_user_2            |
+---------------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM t_user_0;
+---------+----------+--------+
| user_id | order_id | status |
+---------+----------+--------+
|       4 |        4 | active |
+---------+----------+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t_user_2;
+---------+----------+--------+
| user_id | order_id | status |
+---------+----------+--------+
|       2 |        2 | active |
+---------+----------+--------+
1 row in set (0.00 sec)

product_2

copymysql> USE product_2;
mysql> SHOW TABLES;
+---------------------+
| Tables_in_product_2 |
+---------------------+
| t_user_1            |
| t_user_3            |
+---------------------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM t_user_1;
+---------+----------+--------+
| user_id | order_id | status |
+---------+----------+--------+
|       1 |        1 | active |
+---------+----------+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t_user_3;
+---------+----------+--------+
| user_id | order_id | status |
+---------+----------+--------+
|       3 |        3 | active |
+---------+----------+--------+
1 row in set (0.00 sec)

Modify the business url

The application will access the target environment by changing the address of the business url to that of the Proxy.

Switching data source

Complete the migration operation.

copymysql> COMMIT MIGRATION 'j010171e761478594bdab280c55014650e5f6';
Query OK, 0 rows affected (0.57 sec)

Refreshing metadata and bringing new data sources online.

copymysql> REFRESH TABLE METADATA;
Query OK, 0 rows affected (0.21 sec)

Resume operations

FAQ #

Is it possible that there is a duplication of split keys from a single table to a split table?
No, it is not possible.
Will the indexes be migrated over together? Is there a problem if there are self-incrementing fields?
No, there is no problem. The index will be migrated with the table structure. Self-incrementing fields do not affect the data. All data will be migrated together, including self-incrementing fields.