Availability Management #

Cluster Self-High Availability #

Multiple computing nodes with high availability: as computing nodes are stateless, in case of a fault, there is no need to switch. As long as one node is alive, the cluster is available.

Cluster Business High Availability #

High-availability software needs to be added between the cluster and the business to ensure high availability. SphereEx can work with users to implement solutions to achieve this.

Same-City Multi-Center Architecture Diagram #

Deployment Architecture Diagram

Deployment Architecture

Switching Description

Fault	RPO	PTO
Single AZ Fault	0	0
Multi-AZ Fault	0	Depends on the time it takes to recover any AZ

Same-room Architecture Diagram #

Deployment Architecture Diagram

Deployment Architecture

Switching Description

Fault	RPO	PTO
Single AZ Fault	0	0
Multi-AZ Fault	0	Depends on the time it takes to recover any AZ

Cluster Component Availability Description #

Name	Architecture	Backend	Security Description
Monitoring Center	stand-alone	Prometheus	Manual handling in case of failure
Monitoring Plugin	stand-alone		Manual handling in case of failure
Governance Center	Cluster	ZooKepper	Automatic switch in case of failure
Log Center	Cluster	Elasticsearch+logstash	Automatic switch in case of failure
Filebeat	stand-alone		Manual handling in case of failure
Computing Node	stand-alone	Engine-Proxy	Automatic offline in case of failure, no need to switch
Storage Node	User deployment environment	User deployment environment	User deployment environment

ZooKeeper Cluster Unavailability (More than half of the nodes are unavailable) #

SQL	SQL Language	Engine Cluster Availability Description
User SQL	DQL	Available
	DML	Available
	DDL	Unavailable
	DCL	Unsupported
	TCL	Unsupported
DistSQL	RQL	Available(reads data from the node’s memory, unavailable after restart)
	RDL	Unavailable
	RAL	Unavailable
	RUL	Unavailable

Classification of SQL Language #

SQL (Structure Query Language) is divided into four main categories: Data Query Language (DQL), Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL). Additionally, Transaction Control Language (TCL) is used to control transactions within a database.

Overview of DQL

DQL(Data Query Language)数据查询语言，数据检索语句，用于从表中获取数据。关键字有：select。

DQL (Data Query Language) is a data retrieval language used to retrieve data from tables. The keyword is select.
select: used to query data from a table. 2. Overview of DML
DML (Data Manipulation Language) is used to manipulate data (records) contained within a database object. The main keywords are update, delete, and insert. DML can manually control the opening, committing, and rolling back of transactions.
insert: used to insert a record into a table.
delete: used to delete one or more records from a table, or all records, but the object of the operation is still a record.
update: used to modify the contents of an existing record in a table. 3. Overview of DDL
DDL (Data Definition Language) is used to define and manage all objects within a database. It manages certain objects within the database (e.g., databases, tables). The main keywords are create, alter, drop, and truncate. DDL are implicitly committed and cannot be rolled back.
create: used to create a database and objects within it.
drop: used to delete tables, indexes, triggers, conditional constraints, and table permissions.
alter: used to modify the definition and properties of a table.
truncate: used to clear the data from a table. 4. Overview of DCL
DCL (Data Control Language) is used to control the permissions of a database and to control when and how transactions occur within a database. These operations make data more secure. The main keywords are grant and revoke.
grant: used to allow the creator of an object to grant specific permissions to a user, group, or all users (PUBLIC).
revoke: used to revoke the access permissions of a user, group, or all users. 5. Overview of TCL
TCL (Transaction Control Language) is used to control transactions within a database. The main keywords are commit, rollback, savepoint, and set transaction.
rollback: used to roll back a transaction.
savepoint: used to create a point for a transaction to roll back to, similar to a snapshot in a virtual machine. There is no limit to the number of savepoints in a transaction. It is used to cancel a portion of a transaction, and when the transaction ends, all savepoints defined in the transaction are automatically deleted.
commit: used to commit a transaction.
set transaction: used to set the various statuses of a transaction, such as read-only, read/write, and isolation level.

Classification of DistSQL Language #

RDL (Resource & Rule Definition Language): responsible for creating, modifying, and deleting resources and rules.
RQL (Resource & Rule Query Language): responsible for querying and displaying resources and rules.
RAL (Resource & Rule Administration Language): responsible for managing hints, transaction type switching, shard execution plans, and other management functions.
RUL (Resource Utility Language): SphereEx-DBPlusEngine’s tool language, responsible for SQL parsing, SQL formatting, preview execution plan, etc.

High Availability Awareness of Storage Nodes #

Definition #

High availability is the most basic requirement of modern systems. As the cornerstone of the system, the database is also essential for high availability. In the distributed database system with storage-compute splitting, the high availability solution of storage node and compute node are different.

The stateful storage nodes need to pay attention to data consistency, health detection, primary node election and so on.

The stateless compute nodes need to detect the changes of storage nodes, they also need to set up an independent load balancer and have the ability of service discovery and request distribution.

The main goal of SphereEx-DBPlusEngine high availability module which is ensuring 7 * 24-hour uninterrupted database service as much as possible.

SphereEx-DBPlusEngine does not provide database high availability capability. It senses the change of databases’ primary-secondary relationship through a third-party provided high availability solution.

Specifically, SphereEx-DBPlusEngine is capable of finding databases, automatically sensing the primary/secondary database relationship, and correcting compute nodes’ connections to databases.

Dynamic Read/write Splitting #

When high availability and read/write splitting are adopted together, it is not necessary to configure specific primary and secondary databases for read/write splitting.

Highly available data sources dynamically correct the primary/secondary relationship of read/write splitting and properly channel read/write traffic.

Supported
- MySQL MGR single-primary mode
- MySQL Primary/secondary replication mode
- openGauss Primary/secondary replication mode
Not supported
- MySQL MGR Multi-primary mode

Principle Introduction #

The high availability solution provided by SphereEx-DBPlusEngine allows users to customize and implement extensions in four steps: pre-checks, dynamic discovery of primary databases, dynamic discovery of secondary databases, and synchronization configuration.

存储节点状态收集 #

Users can configure the storage node state collection interval in milliseconds by configuring the storage_node_heartbeat_interval_milliseconds parameter. If the configuration is equal to 0, the storage node state collection function will be turned off. From the operation log of the computing node, you can view the logs collected by the storage node status

Config by DistSQL

copyset dist variable storage_node_heartbeat_interval_milliseconds = 10000;

Config by yaml

copyprops:
  storage-node-heartbeat-interval-milliseconds: 10000

Scenario

Unicast routing to filter out unavailable data sources. Unicast routing usage scenarios: such as broadcast table query.
Static read-write splitting read routing, filtering out unavailable data sources.

Cluster ecosystem tool availability Description #

Tool	Deployment Architecture	Security Description
SphereEx-Boot	Standalone	Manual intervention is required in case of failure
SphereEx-Console	Standalone	Manual intervention is required in case of failure. The backend database is a primary-secondary architecture and cannot switch automatically and requires human intervention.