Observability #

Background #

To grasp the distributed system status, observing the running state of the cluster is a new challenge. The point-to-point operation mode of logging into a specific server cannot suit many distributed servers.

Telemetry through observable data is the recommended operation and maintenance mode. Tracking, metrics, and logging are important ways to obtain visible system status data.

Challenges #

Tracing and metrics need to collect system information through event tracking. Lots of event tracking makes kernel code a mess, difficult to maintain, and difficult to customize and extend.

Core Concepts #

Agent #

Based on bytecode enhancement and plugin design to provide tracing, metrics, and logging features.

Only after the plugin of the Agent is enabled, the monitoring indicator data can be output to the third-party APM for display.

APM #

APM is an acronym for Application Performance Monitoring.

Focusing on the performance diagnosis of distributed systems, its main functions include call chain display, application topology analysis, etc.

Tracing #

Tracing data between distributed services or internal processes will be collected by the agent. It will then be sent to third-party APM systems.

Metrics #

System statistical indicators are collected through probes and written in Time Series Database for display by third-party applications.

How it Works #

Agent module provides an observable framework for SphereEx-DBPlusEngine, which is implemented based on Java Agent. Metrics, tracing, and logging functions are integrated into the agent through plugins.

The Metrics plugin is used to collect and display statistical indicators for the entire cluster. Apache ShardingSphere supports Prometheus by default.
The tracing plugin is used to obtain the link trace information of SQL parsing and SQL execution. Apache ShardingSphere provides support for Jaeger, OpenTelemetry, OpenTracing(SkyWalking), and Zipkin by default. It also supports users in developing customized tracing components through plugins.
The default logging plugin shows how to record additional logs in ShardingSphere. In practical applications, users need to explore according to their own needs.

APM #

APM (Application Performance Monitoring) is to monitor and diagnose the performance of the system by collecting, storing, and analyzing the observable data of the system. Its main functions include performance index monitoring, call stack analysis, service topology, etc.

Metrics #

DBPlusEngine is not responsible for gathering, storing, and demonstrating APM data, but provides the necessary information for the APM. In other words, DBPlusEngine is only responsible for generating valuable data and submitting it to relevant systems through standard protocols or plug-ins.

Tracing #

The tracing plugin is used to obtain the link trace information of SQL parsing and SQL execution. DBPlusEngine provides support for SkyWalking, Zipkin, Jaeger, and OpenTelemetry by default. It also supports users in developing customized tracing components through plugins.

Use Zipkin and Jaeger
Enable the Zipkin and Jaeger plugins in the configuration file and configure their server information.
Use OpenTelemetry
OpenTelemetry was merged with OpenTracing and OpenCencus in 2019. In this way, you only need to fill in the appropriate configuration in the agent configuration file according to OpenTelemetry SDK Autoconfigure Guide.
Use SkyWalking
Enable the SkyWalking plugin in the configuration file and need to configure the SkyWalking apm-toolkit.
Use SkyWalking’s automatic monitor probe
Cooperating with Apache SkyWalking team, DBPlusEngine team has realized ShardingSphere automatic monitor probe automatically sends performance data to SkyWalking. Note that the automatic probe in this way can not be used together with Apache ShardingSphere plug-in probe.
Metrics are used to collect and display statistical indicators of the clusters. DBPlusEngine supports Prometheus by default.