Monitor and Alarm #
Description:
- Monitoring can only be completed through SphereEx-Console.
- Currently, there is no alarm function.
Cluster ecosystem tool monitoring #
Currently, SphereEx-Console and SphereEx-Boot have no monitor function.
Monitoring of cluster components #
Host monitoring #
- Monitoring indicators
Category | Subcategory | Indicator |
---|---|---|
Resource Overview | Resource Overview | System uptime |
CPU Cores | ||
Total Memory | ||
Total CPU Usage | ||
Total Memory Usage | ||
Space Usage for each | ||
Performance Data | Performance Data | CPU CPU usage |
Memory Statistics | ||
Network bandwidth (per s) | ||
System load (1, 5, 15 min) | ||
Disk read/write throughput (per s) | ||
Disk read/write rate IOPS | ||
IO operation ratio (per s) | ||
IO read/write time (per time) |
View Monitoring
Applicable Scenarios
View host monitoring.
Notes
- The monitoring center is functioning normally.
- The database has been configured for monitoring.
Steps
- Log in to SphereEx-Console.
- Click Monitoring -> Hosts to enter the monitoring list.
- Click the Monitoring button in the action column to view the monitoring indicators.
Configure Monitoring
Applicable Scenarios
Configure host monitoring.
Notes
- The monitoring center is functioning normally.
- The Governance Center host has been registered and has read-write permissions for the monitoring plugin installation directory.
Steps
- Log in to SphereEx-Console.
- Click Resources -> Hosts to enter the host list.
- Click Configure Monitoring in the operation column to enter the monitoring configuration page.
- Configure the monitoring information as follows.
Field | Data Source | Required/Optional | Description |
---|---|---|---|
Host IP | Previous page | ||
Monitoring Center | User selection, monitoring center list | Required | Installed monitoring center |
Monitoring plugin port | Default filled, user editable | Required | Default value is the default installation directory corresponding to the monitoring plugin, which can be edited |
Monitoring plugin installation directory | Default filled, user editable | Default value is the default installation directory corresponding to the monitoring plugin, which can be edited |
- Click Install to complete the installation of the monitoring plugin and the configuration of the monitoring center.
Database Monitoring #
- Monitoring Metrics
Category | Subcategory | Metrics |
---|---|---|
Overview | Overview | Instance availability |
File open count | ||
Read-only secondary | ||
Master-slave delay | ||
Secondary SQL thread | ||
Secondary IO thread | ||
Slow query enabled | ||
Slow query threshold | ||
Performance metrics | Performance metrics | QPS (Queries value / time within a specified time) |
TPS ((com_insert + com_delete + com_update + com_select) count / time) | ||
Inbound traffic | ||
Outbound traffic | ||
Number of slow queries | ||
Current number of connections | ||
Buffer pool utilization rate |
Viewing Monitoring
- Applicable Scenarios
View database instance monitoring
Notes
- The monitoring center is functioning normally.
- The database has been configured for monitoring.
Operation Steps
- Log in to SphereEx-Console.
- Click Monitoring->Database to enter the monitoring list.
- Click the Monitoring button in the operation column to view the monitoring indicators.
Configuring Monitoring
Applicable Scenarios
Configure database monitoring
Notes
- The monitoring center is functioning normally.
- The Governance Center host has been registered and has read-write permissions for the monitoring plugin installation directory.
Operation Steps
- Log in to SphereEx-Console.
- Click Resource->Database to enter the host list.
- Click Configure Monitoring in the operation column to enter the monitoring configuration page.
- Configure the monitoring information as follows.
Field Name | Data Source | Optional/Required | Description |
---|---|---|---|
Monitoring Center | User-selected, List of Monitoring Centers | Required | Pre-installed monitoring centers |
Monitoring Plugin Port | Default filled, User-editable | Required | The default port corresponding to the monitoring plug-in is filled by default, editable |
Monitoring Plugin Installation Directory | Default filled, User-editable | Default installation directory for monitoring plugins, editable | |
Database Monitoring User | User input | Required | |
Database Monitoring Password | User input | Required | Password protected |
- Click Install to complete the installation of the monitoring plugin and configuration of the monitoring center.
Governance Center Monitoring #
- Monitoring Metrics
Category | Subcategory | Metrics |
---|---|---|
Overview | Overview | ZK cluster node status |
Node Roles | ||
Follower number | ||
Performance | Performance | Average response latency |
Maximum response latency | ||
Minimum response latency | ||
Packets received | ||
Packets sent | ||
Active connections | ||
Pending requests | ||
Primary-Secondary Status | ||
Znode number | ||
Watch number | ||
Temporary node count | ||
Approximate total data size | ||
Open file descriptor count | ||
Maximum file descriptor count | ||
Sync operations blocked |
Viewing Monitoring Metrics
Applicable Scenarios
View monitoring metrics for Governance Center instances.
Note
- The monitoring center is functioning normally.
- Governance Center monitoring is configured.
Procedure
- Log in to SphereEx-Console.
- Click Monitoring -> Governance Center to enter the Governance Center list.
- Click Monitoring Node in the Operations column to enter the monitoring list.
- Click the Monitor button in the Operations column to view the monitoring metrics.
Configuring Monitoring
Applicable Scenarios
Configure monitoring for Governance Center.
Prerequisites
- The monitoring center is functioning normally.
- The Governance Center host has been registered and has read-write permissions for the monitoring plugin installation directory.
Procedure
- Log in to SphereEx-Console.
- Click Resources > Governance Center to enter the host list.
- Click Configure Monitoring in the Operations column to enter the monitoring configuration page.
- Configure the monitoring information as follows.
Fields | Data Source | Required/Opeational | Description |
---|---|---|---|
Monitoring Centre | User select | Required | Only one monitoring center allowed per cluster |
Governance Center Node IP | Auto filled preves page | Required | Not editable |
Governance Center Node Port | Auto filled previous page information | Required | Not editable |
Monitoring Plugin Port | User-editable default value | Required | Default port for the monitoring plugin, editable |
Monitoring Plugin Installation Directory | User-editable default value | Default port for the monitoring plugin, editable |
- Click Install to complete the installation of the monitoring plugin and the configuration of the monitoring center.
Monitoring Center Monitoring #
- Monitoring Metrics
Category | Subcategory | Metrics |
---|---|---|
Overview | Overview | Version |
Number of monitored instances | ||
Number of threads | ||
Last successful configuration reload time | ||
Was the last configuration reload successful | ||
Total number of chunks | ||
Number of created chunks | ||
Number of removed chunks | ||
Total number of samples |
View Monitoring
Applicable Scenarios
View monitoring data of the monitoring center itself
Note
Monitoring center is running properly
Steps
- Log in to SphereEx-Console.
- Click Monitoring > Monitoring Center to enter the monitoring list.
- Click the Monitoring button in the operation column to view the monitoring indicators.
Configure Monitoring
The monitoring of the monitoring center is enabled by default when it is installed, and no additional configuration is required.
Log Center Monitoring #
Todo
This feature will be implemented in Console version 1.2.
Cluster Monitoring #
- Monitoring indicators
Category | Subcategory | Metrics |
---|---|---|
Overview | Cluster Overview | Number of component nodes |
Number of storage nodes | ||
Number of compute nodes | ||
Number of governance center nodes | ||
Metadata | Metadata | Number of logical databases |
Number of users | ||
Number of tables (logical tables + single tables) | ||
Number of sharded tables | ||
Number of broadcast tables | ||
Number of table groups | ||
Number of single tables | ||
Number of encrypted tables | ||
Number of plugins (excluding single table plugins) | ||
Performance Data | Connection details | Number of routes (instant value, change value, change rate) |
Number of executions (instant value, change value, change rate) | ||
Number of parses (instant value, change value, change rate) | ||
Number of requests (instant value, change value, change rate) | ||
Number of connections (instant value) | ||
Performance Details | QPS | |
TPS | ||
Number of request bytes | ||
Number of response bytes | ||
Response time | ||
Total number of transactions | ||
Number of committed transactions | ||
Number of rolled-back transactions | ||
Transaction rollback rate | ||
Connection duration (analysis) | ||
Request duration (analysis) | ||
Parsing Engine | DML sql | Total number of inserts |
Total number of deletes | ||
Total number of updates | ||
Total number of selects | ||
SQL statistics (Insert\Delete\Update\Select) | ||
DDL sql | Total number of DDLs | |
Total number of DCLs | ||
Total number of DALs | ||
Total number of TCLs | ||
SQL statistics (DDL\DCL\DAL\TCL) | ||
DistSQL | Total number of RQLs | |
Total number of RDLs | ||
Total number of RALs | ||
DistSQL statistics (RQL\RDL\RAL) | ||
Parsing duration | ||
Routing Engine | Routing Engine | Data source routing |
Top 10 table routing analysis | ||
Thread Status | Thread Status | Current number of threads |
Number of daemon threads | ||
Peak number of threads | ||
Total number of thread starts | ||
Number of thread deadlocks | ||
JVM thread state data | ||
Errors | Errors | Number of errors |
Numbers of errors |
View monitoring
Applicable scenarios
View cluster monitoring.
Notes
- Monitoring center is functioning normally.
- Governance Center has already been configured for monitoring.
Operation steps:
- Log in to SphereEx-Console.
- Click Monitoring->Cluster to enter the monitoring list.
- Click the Monitoring button in the operation column to view the monitoring indicators.
Configure monitoring
Applicable scenarios
Configure cluster monitoring, which actually configures the monitoring of the computing nodes.
Notes
- Monitoring center is functioning normally.
- The host for the Governance Center has been registered and has read and write permissions for the monitoring plugin installation directory.
- Operation steps
- Log in to SphereEx-Console.
- Click Cluster Management->Cluster to enter the host list.
- Click Configure Monitoring in the operation column to enter the monitoring configuration page.
- Configure the monitoring information as follows.
Field Name | Data Source | Optional/Required | Description |
---|---|---|---|
Compute Node IP | Input on the previous page | Required | Not editable |
Monitoring Plugin Port | User input | Required | For self-built clusters, the monitoring plugin port is automatically filled and not allowed to be modified. For registered clusters, the user can fill in the port. |
- Click on Add Monitoring Configuration to complete the monitoring center configuration.
Compute node monitoring #
- Monitoring indicators
Category | Subcategory | Metrics |
---|---|---|
Overview | Overview | JDK version info |
Start time | ||
Compute node version | ||
Running status | ||
Running time | ||
Used memory size | ||
Used cache pool size | ||
Performance | Performance | Total requests |
Total routes | ||
Total executions | ||
Total parsing | ||
Number of requests | ||
Number of parses | ||
Number of routes | ||
Number of executions | ||
QPS | ||
TPS | ||
Transaction rollback rate | ||
Metadata | Metadata | Number of logical databases |
Number of users | ||
Number of tables (logical + single) | ||
Number of sharded tables | ||
Parsing engine | DML sql | Total Insert |
Total Delete | ||
Total Update | ||
Total Select | ||
SQL statistics (Insert\Delete\Update\Select) | ||
DDL sql | Total DDL | |
Total DCL | ||
Total DAL | ||
Total TCL | ||
SQL statistics (DDL\DCL\DAL\TCL) | ||
DistSQL | Total RQL | |
Total RDL | ||
Total RAL | ||
DistSQL statistics (RQL\RDL\RAL) | ||
Parsing time | ||
Routing engine | Routing engine | Data source routing |
Top 10 table routing analysis | ||
Thread status | Thread status | Current thread count |
Number of daemon threads | ||
Peak thread count | ||
Total thread startup | ||
Thread deadlock count | ||
JVM thread status data | ||
Error statistics | Error statistics | Numbers of errors |
Numbers of errors |
View monitoring
Application Scenario
View cluster monitoring.
Notes
- Monitoring center is functioning normally.
- Governance Center has already been configured for monitoring.
Steps
- Log in to SphereEx-Console.
- Click Monitoring -> Cluster to enter the cluster list.
- Click Node Monitoring in the operation column to enter the compute node list.
- Click Monitoring in the operation column to view monitoring indicators.
Configure monitoring
Complete by configuring cluster monitoring.。
Storage node monitoring #
Currently, monitoring of storage nodes is not supported.
Agent management #
Introduction to Agent #
Background
In order to grasp the distributed system status, observe running state of the cluster is a new challenge. The point-to-point operation mode of logging in to a specific server cannot suite to large number of distributed servers. Telemetry through observable data is the recommended operation and maintenance mode for them. Tracking, metrics and logging are important ways to obtain observable data of system status. APM (application performance monitoring) is to monitor and diagnose the performance of the system by collecting, storing and analyzing the observable data of the system. Its main functions include performance index monitoring, call stack analysis, service topology, etc. DBPlusEngine is not responsible for gathering, storing and demonstrating APM data, but provides the necessary information for the APM. In other words, DBPlusEngine is only responsible for generating valuable data and submitting it to relevant systems through standard protocols or plug-ins. Tracing is to obtain the tracking information of SQL parsing and SQL execution. DBPlusEngine provides support for SkyWalking,Zipkin,Jaeger and OpenTelemetry by default. It also supports users to develop customized components through plug-in.MMMMMM
- Use OpenTelemetry
OpenTelemetry was merged by OpenTracing and OpenCencus in 2019. In this way, you only need to fill in the appropriate configuration in the agent configuration file according to OpenTelemetry SDK Autoconfigure Guide,Data can be exported to Jaeger, Zipkin.
- Use SkyWalking
Cooperating with Apache SkyWalking team, DBPlusEngine team has realized ShardingSphere automatic monitor probe to automatically send performance data to SkyWalking. Note that automatic probe in this way cannot be used together with DBPlusEngine plug-in probe.
Metrics used to collect and display statistical indicator of cluster. DBPlusEngine supports Prometheus by default.
Challenges
Tracing and metrics need to collect system information through event tracking. Lots of events tracking make kernel code mess, difficult to maintain, and difficult to customize extend.
Goal
The goal of Apache ShardingSphere observability module is providing as many performance and statistical indicators as possible and isolating kernel code and embedded code.
Core concepts
- Agent
Based on bytecode enhancement and plugin design to provide tracing, metrics and logging features.
- APM
APM is an acronym for Application Performance Monitoring. Focusing on the performance diagnosis of distributed systems, its main functions include call chain display, application topology analysis, etc.
- Tracing
Tracing data between distributed services or internal processes will be collected by agent. It will then be sent to third-party APM systems.
- Metrics
System statistical indicators are collected through probes for writing in timing database and display by third-party applications.
Agent Configuration #
Agent Configuration
- Directory Structure
Create an agent directory and extract the agent-bin package into the agent directory.
mkdir agent
tar -zxvf apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz -C agent
cd agent
tree
.
├── LICENSE
├── NOTICE
├── README.txt
├── conf
│ └── agent.yaml
├── plugins
│ ├── lib
│ │ ├── shardingsphere-agent-metrics-core-${latest.release.version}-SNAPSHOT.jar
│ │ └── shardingsphere-agent-plugin-core-${latest.release.version}-SNAPSHOT.jar
│ ├── logging
│ │ └── shardingsphere-agent-logging-file-${latest.release.version}-SNAPSHOT.jar
│ ├── metrics
│ │ └── shardingsphere-agent-metrics-prometheus-${latest.release.version}-SNAPSHOT.jar
│ └── tracing
│ ├── shardingsphere-agent-tracing-opentelemetry-${latest.release.version}-SNAPSHOT.jar
│ └── shardingsphere-agent-tracing-opentracing-${latest.release.version}-SNAPSHOT.jar
├── shardingsphere-agent-${latest.release.version}-SNAPSHOT.jar
└── template
├── dbplusengine-driver-grafana-template.json
└── dbplusengine-proxy-grafana-template.json
- Configuration
The file agent.yaml is the configuration file, and the available plugins include Jaeger, OpenTracing, Zipkin, OpenTelemetry, Logging, and Prometheus. To enable a plugin, simply comment out the corresponding plugin name in ignoredPluginNames.
plugins:
logging:
File:
props:
slow-query-log: true
long-query-time: 5000
general-query-log: true
# metrics:
# Prometheus:
# host: "localhost"
# port: 9090
# props:
# jvm-information-collector-enabled: "true"
# tracing:
# OpenTelemetry:
# props:
# otel.service.name: "shardingsphere"
# otel.traces.exporter: "jaeger"
# otel.exporter.otlp.traces.endpoint: "http://localhost:14250"
# otel.traces.sampler: "always_on"
- Parameter Description
Name | Description | Range | Default Value |
---|---|---|---|
slow-query-log | Whether to enable the slow query log | true, false | TRUE |
long-query-time | Slow query threshold (ms) | positive integer | 5000 |
general-query-log | Full query log | true, false | TRUE |
jvm-information-collector-enabled | Whether to enable the JVM collector | true, false | TRUE |
otel.service.name | Service name for link tracking | String | shardingsphere |
otel.traces.exporter | Expoter jaeger, zipkin | jaeger | |
otel.exporter.otlp.traces.endpoint | Data sending address, Actual receiving data address | http://localhost:14250 | |
otel.traces.sampler | Sampler always_on | always_on |
More info aboutOpenTelemetry refers OpenTelemetry SDK Autoconfigure Guide
Using Agent in DBPlusEngine-Proxy #
Start Agent at the same time as Start Proxy
bin/start.sh -g
Normal startup logs can be viewed in the corresponding DBPlusEngine-Proxy logs, and Metric and Tracing data can be viewed through the configured address after accessing the Proxy.
Agent Metrics
Metrics | Description | Metrics Types |
---|---|---|
build_info | build version information | GAUGE |
proxy_info | “Running information, the type tag distinguishes the type. boot-time indicates the timestamp of the startup time, boot-duration indicates the startup time (in milliseconds), uptime indicates the elapsed running time (in milliseconds)” | GAUGE |
proxy_meta_data_info | Metadata info | GAUGE |
proxy_state | Status information, 0 normal status, 1 fuse status, 2 lock status | GAUGE |
proxy_current_connections | Current client connections | GAUGE |
parsed_sql_total | SQL parsing numbers, distinguished by INSERT, DELETE, UPDATE, SELECT, DDL, DCL, DAL, TCL, RQL, RDL, RAL, RUL types | COUNTER |
routed_sql_total | Number of SQL routes, differentiated by INSERT, DELETE, UPDATE, SELECT type | COUNTER |
routed_result_total | The number of routing results, counting the number of storage nodes and tables routed to | COUNTER |
proxy_transactions_total | Total number of transactions, categorized by commit, rollback, autocommit | COUNTER |
proxy_execute_errors_total | Number of Execution Exceptions, by Exception Type | COUNTER |
proxy_requests_total | Number of requests received | COUNTER |
proxy_execute_total | Total number of executions (counts the number of SQLs executed by routes to storage nodes) | COUNTER |
proxy_execute_error_total | execution exception number | COUNTER |
proxy_request_bytes_total | request bytes | COUNTER |
proxy_response_bytes_total | response bytes | COUNTER |
parse_sql_latency_millis | Parsing SQL time-consuming | HISTOGRAM |
route_sql_latency_millis | Route SQL time-consuming | HISTOGRAM |
proxy_execute_latency_millis | Execution time-consuming | HISTOGRAM |
commit_sql_count_histogram | The distribution of the number of SQL submitted in each transaction HISTOGRAM | |
rollback_sql_count_histogram | The distribution of the number of SQL in each rollback transaction | HISTOGRAM |
proxy_connection_usage_seconds | Connection Duration Distribution to Proxy | HISTOGRAM |
DBPlusEngine-Driver 中使用 Agent #
Start Agent
Prepare projects that integrate DBPlusEngine-Driver, such as SpringBoot projects
Add javaagent configuration at startup
java -javaagent:/xxx/shardingsphere-agent-${latest.release.version}.jar -jar spring-boot-dbplusengine-driver-test.jar
Agent Metrics
Metrics | Description | Metrics Types |
---|---|---|
build_info | build version information | GAUGE |
jdbc_state | Status information, 0 normal status, 1 fuse status, 2 lock status | GAUGE |
jdbc_meta_data_info | Metadata info | GAUGE |
parsed_sql_total | SQL parsing numbers, distinguished by INSERT, DELETE, UPDATE, SELECT, DDL, DCL, DAL, TCL, RQL, RDL, RAL, RUL types | COUNTER |
routed_sql_total | Number of SQL routes, differentiated by INSERT, DELETE, UPDATE, SELECT type | COUNTER |
routed_result_total | The number of routing results, counting the number of storage nodes and tables routed to | COUNTER |
jdbc_statement_execute_total | Execution SQL statement number | COUNTER |
jdbc_statement_execute_errors_total | execution exception number | COUNTER |
jdbc_transactions_total | Number of transactions, categorized by commit, rollback, autocommit | COUNTER |
jdbc_statement_execute_latency_millis | Execution SQL time-consuming | HISTOGRAM |
parse_sql_latency_millis | Parsing SQL time-consuming | HISTOGRAM |
route_sql_latency_millis | Route SQL time-consuming | HISTOGRAM |