Read/write splitting #

Background #

Database throughput is facing bottlenecks with increasing TPS. For applications with massive concurrence read but less write at the same time, we can divide the database into a primary database and a replica database. The primary database is responsible for the insert, delete and update of transactions, while the replica database is responsible for queries. It can significantly improve the query performance of the whole system by effectively avoiding row locks.

One primary database with multiple replica databases can further enhance processing capacity by distributing queries evenly into multiple data replicas. Multiple primary databases with multiple replica databases can enhance not only throughput but also availability. Therefore, the system can still run normally, even though any database is down or physical disk destroyed.

Different from the sharding that separates data to all nodes according to sharding keys, read/write splitting routes read and write separately to primary database and replica databases according SQL analysis.

Background

Data in read/write splitting nodes are consistent, whereas that in shards is not. The combined use of sharding and read/write splitting will effectively enhance the system performance.

Challenges #

Though read/write splitting can enhance system throughput and availability, it also brings inconsistent data, including that among multiple primary databases and among primary databases and replica databases. What’s more, it also brings the same problem as data sharding, complicating developer and operator’s maintenance and operation. The following diagram shows the complex topological relationship between applications and database groups when sharding used together with read/write splitting.

Challenges

Goal #

The main design goal of read/write splitting of DBPlusEngine is to try to reduce the influence of read/write splitting, in order to let users use primary-replica database group like one database.

Core Concept #

Primary Database #

It refers to the database used in data insertion, update and deletion. It only supports single primary database for now.

Replica Database #

It refers to the database used in data query. It supports multiple replica databases.

Primary Replica Replication #

It refers to the operation to asynchronously replicate data from the primary database to the replica database. Because of the asynchrony of primary-replica synchronization, there may be short-time data inconsistency between them.

Load Balance Strategy #

Through this strategy, queries separated to different replica databases.

Use Norms #

Supported #

Provide the read/write splitting configuration of one primary database with multiple replica databases, which can be used alone or with sharding table and database;
Primary nodes need to be used for both reading and writing in the transaction;
Forcible primary database route based on SQL Hint;
The primary database provides read traffic when all secondary databases are down.

Unsupported #

Data replication between the primary and the replica databases;
Data inconsistency caused by replication delay between databases;
Double or multiple primary databases to provide write operation;
The data for transaction across primary and replica nodes are inconsistent; In the read/write splitting model, primary nodes need to be used for both reading and writing in the transaction.