Availability Management #
Cluster Self-High Availability #
Multiple computing nodes with high availability: as computing nodes are stateless, in case of a fault, there is no need to switch. As long as one node is alive, the cluster is available.
Cluster Business High Availability #
High-availability software needs to be added between the cluster and the business to ensure high availability. SphereEx can work with users to implement solutions to achieve this.
Same-City Multi-Center Architecture Diagram #
- Deployment Architecture Diagram
- Switching Description
Fault | RPO | PTO |
---|---|---|
Single AZ Fault | 0 | 0 |
Multi-AZ Fault | 0 | Depends on the time it takes to recover any AZ |
Same-room Architecture Diagram #
- Deployment Architecture Diagram
- Switching Description
Fault | RPO | PTO |
---|---|---|
Single AZ Fault | 0 | 0 |
Multi-AZ Fault | 0 | Depends on the time it takes to recover any AZ |
Cluster Component Availability Description #
Name | Architecture | Backend | Security Description |
---|---|---|---|
Monitoring Center | stand-alone | Prometheus | Manual handling in case of failure |
Monitoring Plugin | stand-alone | Manual handling in case of failure | |
Governance Center | Cluster | ZooKepper | Automatic switch in case of failure |
Log Center | Cluster | Elasticsearch+logstash | Automatic switch in case of failure |
Filebeat | stand-alone | Manual handling in case of failure | |
Computing Node | stand-alone | Engine-Proxy | Automatic offline in case of failure, no need to switch |
Storage Node | User deployment environment | User deployment environment | User deployment environment |
ZooKeeper Cluster Unavailability (More than half of the nodes are unavailable) #
SQL | SQL Language | Engine Cluster Availability Description |
---|---|---|
User SQL | DQL | Available |
DML | Available | |
DDL | Unavailable | |
DCL | Unsupported | |
TCL | Unsupported | |
DistSQL | RQL | Available(reads data from the node’s memory, unavailable after restart) |
RDL | Unavailable | |
RAL | Unavailable | |
RUL | Unavailable |
Classification of SQL Language #
SQL (Structure Query Language) is divided into four main categories: Data Query Language (DQL), Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL). Additionally, Transaction Control Language (TCL) is used to control transactions within a database.
- Overview of DQL
DQL(Data Query Language)数据查询语言,数据检索语句,用于从表中获取数据。关键字有:select。
- DQL (Data Query Language) is a data retrieval language used to retrieve data from tables. The keyword is select.
- select: used to query data from a table. 2. Overview of DML
- DML (Data Manipulation Language) is used to manipulate data (records) contained within a database object. The main keywords are update, delete, and insert. DML can manually control the opening, committing, and rolling back of transactions.
- insert: used to insert a record into a table.
- delete: used to delete one or more records from a table, or all records, but the object of the operation is still a record.
- update: used to modify the contents of an existing record in a table. 3. Overview of DDL
- DDL (Data Definition Language) is used to define and manage all objects within a database. It manages certain objects within the database (e.g., databases, tables). The main keywords are create, alter, drop, and truncate. DDL are implicitly committed and cannot be rolled back.
- create: used to create a database and objects within it.
- drop: used to delete tables, indexes, triggers, conditional constraints, and table permissions.
- alter: used to modify the definition and properties of a table.
- truncate: used to clear the data from a table. 4. Overview of DCL
- DCL (Data Control Language) is used to control the permissions of a database and to control when and how transactions occur within a database. These operations make data more secure. The main keywords are grant and revoke.
- grant: used to allow the creator of an object to grant specific permissions to a user, group, or all users (PUBLIC).
- revoke: used to revoke the access permissions of a user, group, or all users. 5. Overview of TCL
- TCL (Transaction Control Language) is used to control transactions within a database. The main keywords are commit, rollback, savepoint, and set transaction.
- rollback: used to roll back a transaction.
- savepoint: used to create a point for a transaction to roll back to, similar to a snapshot in a virtual machine. There is no limit to the number of savepoints in a transaction. It is used to cancel a portion of a transaction, and when the transaction ends, all savepoints defined in the transaction are automatically deleted.
- commit: used to commit a transaction.
- set transaction: used to set the various statuses of a transaction, such as read-only, read/write, and isolation level.
Classification of DistSQL Language #
- RDL (Resource & Rule Definition Language): responsible for creating, modifying, and deleting resources and rules.
- RQL (Resource & Rule Query Language): responsible for querying and displaying resources and rules.
- RAL (Resource & Rule Administration Language): responsible for managing hints, transaction type switching, shard execution plans, and other management functions.
- RUL (Resource Utility Language): SphereEx-DBPlusEngine’s tool language, responsible for SQL parsing, SQL formatting, preview execution plan, etc.
High Availability Awareness of Storage Nodes #
Definition #
High availability is the most basic requirement of modern systems. As the cornerstone of the system, the database is also essential for high availability. In the distributed database system with storage-compute splitting, the high availability solution of storage node and compute node are different.
The stateful storage nodes need to pay attention to data consistency, health detection, primary node election and so on.
The stateless compute nodes need to detect the changes of storage nodes, they also need to set up an independent load balancer and have the ability of service discovery and request distribution.
The main goal of SphereEx-DBPlusEngine high availability module which is ensuring 7 * 24-hour uninterrupted database service as much as possible.
SphereEx-DBPlusEngine does not provide database high availability capability. It senses the change of databases’ primary-secondary relationship through a third-party provided high availability solution.
Specifically, SphereEx-DBPlusEngine is capable of finding databases, automatically sensing the primary/secondary database relationship, and correcting compute nodes’ connections to databases.
Dynamic Read/write Splitting #
When high availability and read/write splitting are adopted together, it is not necessary to configure specific primary and secondary databases for read/write splitting.
Highly available data sources dynamically correct the primary/secondary relationship of read/write splitting and properly channel read/write traffic.
- Supported
- MySQL MGR single-primary mode
- MySQL Primary/secondary replication mode
- openGauss Primary/secondary replication mode
- Not supported
- MySQL MGR Multi-primary mode
Principle Introduction #
The high availability solution provided by SphereEx-DBPlusEngine allows users to customize and implement extensions in four steps: pre-checks, dynamic discovery of primary databases, dynamic discovery of secondary databases, and synchronization configuration.
存储节点状态收集 #
Users can configure the storage node state collection interval in milliseconds by configuring the storage_node_heartbeat_interval_milliseconds parameter. If the configuration is equal to 0, the storage node state collection function will be turned off. From the operation log of the computing node, you can view the logs collected by the storage node status
Config by DistSQL
set dist variable storage_node_heartbeat_interval_milliseconds = 10000;
Config by yaml
props:
storage-node-heartbeat-interval-milliseconds: 10000
Scenario
- Unicast routing to filter out unavailable data sources. Unicast routing usage scenarios: such as broadcast table query.
- Static read-write splitting read routing, filtering out unavailable data sources.
Cluster ecosystem tool availability Description #
Tool | Deployment Architecture | Security Description |
---|---|---|
SphereEx-Boot | Standalone | Manual intervention is required in case of failure |
SphereEx-Console | Standalone | Manual intervention is required in case of failure. The backend database is a primary-secondary architecture and cannot switch automatically and requires human intervention. |