Observability #
Background #
In order to grasp the distributed system status, observe running state of the cluster is a new challenge. The point-to-point operation mode of logging into a specific server is not suitable for large number of distributed servers. Telemetry through observable data is the recommended operation and maintenance mode in such cases. Tracking, metrics and logging are important ways to obtain observable data of system status.
APM (application performance monitoring) monitors and diagnoses the performance of the system by collecting, storing and analyzing the observable data of the system. Its main functions include performance index monitoring, call stack analysis, service topology, etc.
DBPlusEngine is not responsible for gathering, storing and demonstrating APM data, but provides the necessary information for the APM. In other words, DBPlusEngine is only responsible for generating valuable data and submitting it to relevant systems through standard protocols or plug-ins. Tracing is to obtain the tracking information of SQL parsing and SQL execution. DBPlusEngine provides support for SkyWalking, Zipkin, Jaeger and OpenTelemetry by default. It also supports users to develop customized components through plug-in.
- Use Zipkin or Jaeger
Just provides correct Zipkin or Jaeger server information in the agent configuration file.
- Use OpenTelemetry
OpenTelemetry was merged by OpenTracing and OpenCencus in 2019. In this way, you only need to fill in the appropriate configuration in the agent configuration file according to OpenTelemetry SDK Autoconfigure Guide.
- Use SkyWalking
Enable the SkyWalking plugin configuration file and configure the SkyWalking apm-toolkit.
- Use SkyWalking’s automatic monitor probe
In cooperation with the Apache SkyWalking team, the DBPlusEngine team has created ShardingSphere
automatic monitor probe to automatically send performance data to SkyWalking
. Note that automatic probe cannot be used together with DBPlusEngine plugin probe.
Metrics used to collect and display statistical indicator of cluster. DBPlusEngine supports Prometheus by default.
Challenges #
Tracing and metrics need to collect system information through event tracking. Lots of events tracking make kernel code messy, difficult to maintain, and difficult to customize extend.
Goal #
The goal of the DBPlusEngine observability module is providing as many performance and statistical indicators as possible and isolating kernel code and embedded code.
Core Concept #
Agent #
Based on bytecode enhance and plugin design to provide tracing, metrics and logging features. Enable the plugin in agent to collect data and send data to the integrated 3rd APM system.
APM #
APM is the abbreviation for application performance monitoring. It works for performance diagnosis of distributed systems, including chain demonstration, service topology analysis and so on.
Tracing #
Tracing data between distributed services or internal processes will be collected by agent. It then will be sent to APM system.
Metrics #
System statistical indicator which collected from agent. Write to time series databases periodically. 3rd party UI can display the metrics data simply.
Usage Norms #
Compile source code #
Download DBPlusEngine from GitHub,Then compile.
git clone --depth 1 https://github.com/apache/shardingsphere.git
cd shardingsphere
mvn clean install -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -DskipITs -DskipTests -Prelease
Output directory: shardingsphere-agent/shardingsphere-agent-distribution/target/apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz
Agent configuration #
Directory structure
Create agent directory, and unzip agent distribution package to the directory.
mkdir agent
tar -zxvf apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz -C agent
cd agent
tree
.
├── conf
│ ├── agent.yaml
│ └── logback.xml
├── plugins
│ ├── shardingsphere-agent-logging-base-${latest.release.version}.jar
│ ├── shardingsphere-agent-metrics-prometheus-${latest.release.version}.jar
│ ├── shardingsphere-agent-tracing-jaeger-${latest.release.version}.jar
│ ├── shardingsphere-agent-tracing-opentelemetry-${latest.release.version}.jar
│ ├── shardingsphere-agent-tracing-opentracing-${latest.release.version}.jar
│ └── shardingsphere-agent-tracing-zipkin-${latest.release.version}.jar
└── shardingsphere-agent.jar
- Configuration file
agent.yaml is a configuration file. The plug-ins include Jaeger, opentracing, Zipkin, opentelemetry, logging and Prometheus. Remove the corresponding plug-in in ignoredpluginnames to start the plug-in.
applicationName: shardingsphere-agent
ignoredPluginNames:
- Jaeger
- OpenTracing
- Zipkin
- OpenTelemetry
- Logging
- Prometheus
plugins:
Prometheus:
host: "localhost"
port: 9090
props:
JVM_INFORMATION_COLLECTOR_ENABLED : "true"
Jaeger:
host: "localhost"
port: 5775
props:
SERVICE_NAME: "shardingsphere-agent"
JAEGER_SAMPLER_TYPE: "const"
JAEGER_SAMPLER_PARAM: "1"
Zipkin:
host: "localhost"
port: 9411
props:
SERVICE_NAME: "shardingsphere-agent"
URL_VERSION: "/api/v2/spans"
SAMPLER_TYPE: "const"
SAMPLER_PARAM: "1"
OpenTracing:
props:
OPENTRACING_TRACER_CLASS_NAME: "org.apache.skywalking.apm.toolkit.opentracing.SkywalkingTracer"
OpenTelemetry:
props:
otel.resource.attributes: "service.name=shardingsphere-agent"
otel.traces.exporter: "zipkin"
Logging:
props:
LEVEL: "INFO"
- Parameter description:
Name | Description | Value range | Default value |
---|---|---|---|
JVM_INFORMATION_COLLECTOR_ENABLED | Start JVM collector | true、false | true |
SERVICE_NAME | Tracking service name | Custom | shardingsphere-agent |
JAEGER_SAMPLER_TYPE | Jaeger sample rate type | const、probabilistic、ratelimiting、remote | const |
JAEGER_SAMPLER_PARAM | Jaeger sample rate parameter | const:0、1, probabilistic:0.0 - 1.0, ratelimiting: > 0, Customize the number of acquisitions per second,remote:need to customize the remote service addres,JAEGER_SAMPLER_MANAGER_HOST_PORT | 1(const type) |
SAMPLER_TYPE | Zipkin sample rate type | const、counting、ratelimiting、boundary | const |
SAMPLER_PARAM | Zipkin sampling rate parameter | const:0、1, counting:0.01 - 1.0, ratelimiting: > 0, boundary:0.0001 - 1.0 | 1(const type) |
otel.resource.attributes | opentelemetry properties | String key value pair (, split) | service.name=shardingsphere-agent |
otel.traces.exporter | Tracing expoter | zipkin、jaeger | zipkin |
otel.traces.sampler | Opentelemetry sample rate type | always_on、always_off、traceidratio | always_on |
otel.traces.sampler.arg | Opentelemetry sample rate parameter | traceidratio:0.0 - 1.0 | 1.0 |
Used in DBPlusEngine-Proxy #
- Startup script
Configure the absolute path of shardingsphere-agent.jar to the start.sh startup script of shardingsphere proxy.
nohup java ${JAVA_OPTS} ${JAVA_MEM_OPTS} \
-javaagent:/xxxxx/agent/shardingsphere-agent.jar \
-classpath ${CLASS_PATH} ${MAIN_CLASS} >> ${STDOUT_FILE} 2>&1 &
- Launch plugin
bin/start.sh
After normal startup, you can view the startup log of the plugin in the DBPlusEngine proxy log, and you can view the data at the configured address.