HA #

Background #

High availability is the most basic requirement of modern systems. As the cornerstone of the system, the database is also essential for high availability.

In the distributed database system with storage-compute splitting, the high availability solution of storage node and compute node are different. The stateful storage nodes need to pay attention to data consistency, health detection, primary node election and so on. The stateless compute nodes need to detect the changes of storage nodes, they also need to set up an independent load balancer and have the ability of service discovery and request distribution.

DBPlusEngine provides compute nodes and reuse database as storage nodes. Therefore, the high availability solution it adopts is to use the high availability solution of the database itself as the high availability of the storage node, and detect the changes automatically.

Challenges #

DBPlusEngine needs to detect high availability solution of diversified storage nodes automatically, and can also integrate the readwrite splitting dynamically, which is the main challenge of implementation.

Goal #

The main goal of DBPlusEngine’s high availability module is to ensure 24/7 uninterrupted database service as much as possible.

Core Concept #

High Availability Type #

DBPlusEngine does not provide a high availability database solution, it reuses 3rd party high availability solution and auto-detect switch of primary and replica databases.

Specifically, the ability of DBPlusEngine provided is database discovery, detect the primary and replica databases automatically, and updates the connection of compute nodes to the databases.

Dynamic Readwrite-Splitting #

When high availability and read/write0-splitting are used together, there is unnecessary to configure specific primary and replica databases for readwrite-splitting.

When high availability and read/write splitting are used together, it supports detecting the delay time of the secondary database during semi-synchronous replication and asynchronous replication, and dynamically routes the secondary database with low delay to provide read traffic.

Highly available data sources will update the primary and replica databases of read/write splitting dynamically, and route the query and update SQL correctly.

High availability also provides that all secondary databases are down, and the read traffic is automatically routed to the main databased, to ensure the availability of the business system.