Logo
Observability

Observability #

Background #

In order to grasp the distributed system status, observe running state of the cluster is a new challenge. The point-to-point operation mode of logging into a specific server is not suitable for large number of distributed servers. Telemetry through observable data is the recommended operation and maintenance mode in such cases. Tracking, metrics and logging are important ways to obtain observable data of system status.

APM (application performance monitoring) monitors and diagnoses the performance of the system by collecting, storing and analyzing the observable data of the system. Its main functions include performance index monitoring, call stack analysis, service topology, etc.

DBPlusEngine is not responsible for gathering, storing and demonstrating APM data, but provides the necessary information for the APM. In other words, DBPlusEngine is only responsible for generating valuable data and submitting it to relevant systems through standard protocols or plug-ins. Tracing is to obtain the tracking information of SQL parsing and SQL execution. DBPlusEngine provides support for SkyWalking, Zipkin, Jaeger and OpenTelemetry by default. It also supports users to develop customized components through plug-in.

  • Use Zipkin or Jaeger

Just provides correct Zipkin or Jaeger server information in the agent configuration file.

  • Use OpenTelemetry

OpenTelemetry was merged by OpenTracing and OpenCencus in 2019. In this way, you only need to fill in the appropriate configuration in the agent configuration file according to OpenTelemetry SDK Autoconfigure Guide.

  • Use SkyWalking

Enable the SkyWalking plugin configuration file and configure the SkyWalking apm-toolkit.

  • Use SkyWalking’s automatic monitor probe

In cooperation with the Apache SkyWalking team, the DBPlusEngine team has created ShardingSphere automatic monitor probe to automatically send performance data to SkyWalking. Note that automatic probe cannot be used together with DBPlusEngine plugin probe.

Metrics used to collect and display statistical indicator of cluster. DBPlusEngine supports Prometheus by default.

Challenges #

Tracing and metrics need to collect system information through event tracking. Lots of events tracking make kernel code messy, difficult to maintain, and difficult to customize extend.

Goal #

The goal of the DBPlusEngine observability module is providing as many performance and statistical indicators as possible and isolating kernel code and embedded code.

Core Concept #

Agent #

Based on bytecode enhance and plugin design to provide tracing, metrics and logging features. Enable the plugin in agent to collect data and send data to the integrated 3rd APM system.

APM #

APM is the abbreviation for application performance monitoring. It works for performance diagnosis of distributed systems, including chain demonstration, service topology analysis and so on.

Tracing #

Tracing data between distributed services or internal processes will be collected by agent. It then will be sent to APM system.

Metrics #

System statistical indicator which collected from agent. Write to time series databases periodically. 3rd party UI can display the metrics data simply.

Usage Norms #

Compile source code #

Download DBPlusEngine from GitHub,Then compile.

git clone --depth 1 https://github.com/apache/shardingsphere.git
cd shardingsphere
mvn clean install -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -DskipITs -DskipTests -Prelease

Output directory: shardingsphere-agent/shardingsphere-agent-distribution/target/apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz

Agent configuration #

  • Directory structure

    Create agent directory, and unzip agent distribution package to the directory.

mkdir agent
tar -zxvf apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz -C agent
cd agent
tree 
.
├── conf
│   ├── agent.yaml
│   └── logback.xml
├── plugins
│   ├── shardingsphere-agent-logging-base-${latest.release.version}.jar
│   ├── shardingsphere-agent-metrics-prometheus-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-jaeger-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-opentelemetry-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-opentracing-${latest.release.version}.jar
│   └── shardingsphere-agent-tracing-zipkin-${latest.release.version}.jar
└── shardingsphere-agent.jar

  • Configuration file

agent.yaml is a configuration file. The plug-ins include Jaeger, opentracing, Zipkin, opentelemetry, logging and Prometheus. Remove the corresponding plug-in in ignoredpluginnames to start the plug-in.

applicationName: shardingsphere-agent
ignoredPluginNames:
  - Jaeger
  - OpenTracing
  - Zipkin
  - OpenTelemetry
  - Logging
  - Prometheus

plugins:
  Prometheus:
    host:  "localhost"
    port: 9090
    props:
      JVM_INFORMATION_COLLECTOR_ENABLED : "true"
  Jaeger:
    host: "localhost"
    port: 5775
    props:
      SERVICE_NAME: "shardingsphere-agent"
      JAEGER_SAMPLER_TYPE: "const"
      JAEGER_SAMPLER_PARAM: "1"
  Zipkin:
    host: "localhost"
    port: 9411
    props:
      SERVICE_NAME: "shardingsphere-agent"
      URL_VERSION: "/api/v2/spans"
      SAMPLER_TYPE: "const"
      SAMPLER_PARAM: "1"
  OpenTracing:
    props:
      OPENTRACING_TRACER_CLASS_NAME: "org.apache.skywalking.apm.toolkit.opentracing.SkywalkingTracer"
  OpenTelemetry:
    props:
      otel.resource.attributes: "service.name=shardingsphere-agent"
      otel.traces.exporter: "zipkin"
  Logging:
    props:
      LEVEL: "INFO"

  • Parameter description:
NameDescriptionValue rangeDefault value
JVM_INFORMATION_COLLECTOR_ENABLEDStart JVM collectortrue、falsetrue
SERVICE_NAMETracking service nameCustomshardingsphere-agent
JAEGER_SAMPLER_TYPEJaeger sample rate typeconst、probabilistic、ratelimiting、remoteconst
JAEGER_SAMPLER_PARAMJaeger sample rate parameterconst:0、1, probabilistic:0.0 - 1.0, ratelimiting: > 0, Customize the number of acquisitions per second,remote:need to customize the remote service addres,JAEGER_SAMPLER_MANAGER_HOST_PORT1(const type)
SAMPLER_TYPEZipkin sample rate typeconst、counting、ratelimiting、boundaryconst
SAMPLER_PARAMZipkin sampling rate parameterconst:0、1, counting:0.01 - 1.0, ratelimiting: > 0, boundary:0.0001 - 1.01(const type)
otel.resource.attributesopentelemetry propertiesString key value pair (, split)service.name=shardingsphere-agent
otel.traces.exporterTracing expoterzipkin、jaegerzipkin
otel.traces.samplerOpentelemetry sample rate typealways_on、always_off、traceidratioalways_on
otel.traces.sampler.argOpentelemetry sample rate parametertraceidratio:0.0 - 1.01.0

Used in DBPlusEngine-Proxy #

  • Startup script

Configure the absolute path of shardingsphere-agent.jar to the start.sh startup script of shardingsphere proxy.

nohup java ${JAVA_OPTS} ${JAVA_MEM_OPTS} \
-javaagent:/xxxxx/agent/shardingsphere-agent.jar \
-classpath ${CLASS_PATH} ${MAIN_CLASS} >> ${STDOUT_FILE} 2>&1 &
  • Launch plugin
bin/start.sh

After normal startup, you can view the startup log of the plugin in the DBPlusEngine proxy log, and you can view the data at the configured address.