Logo
Observability

Observability #

Background #

In order to grasp the distributed system status, observe running state of the cluster is a new challenge. The point-to-point operation mode of logging into a specific server is not suitable for large number of distributed servers. Telemetry through observable data is the recommended operation and maintenance mode in such cases. Tracking, metrics and logging are important ways to obtain observable data of system status.

APM (application performance monitoring) monitors and diagnoses the performance of the system by collecting, storing and analyzing the observable data of the system. Its main functions include performance index monitoring, call stack analysis, service topology, etc.

DBPlusEngine is not responsible for gathering, storing and demonstrating APM data, but provides the necessary information for the APM. In other words, DBPlusEngine is only responsible for generating valuable data and submitting it to relevant systems through standard protocols or plug-ins. Tracing is to obtain the tracking information of SQL parsing and SQL execution. DBPlusEngine provides support for SkyWalking, Zipkin, Jaeger and OpenTelemetry by default. It also supports users to develop customized components through plug-in.

  • Use Zipkin or Jaeger

Just provides correct Zipkin or Jaeger server information in the agent configuration file.

  • Use OpenTelemetry

OpenTelemetry was merged by OpenTracing and OpenCencus in 2019. In this way, you only need to fill in the appropriate configuration in the agent configuration file according to OpenTelemetry SDK Autoconfigure Guide.

  • Use SkyWalking

Enable the SkyWalking plugin configuration file and configure the SkyWalking apm-toolkit.

  • Use SkyWalking’s automatic monitor probe

In cooperation with the Apache SkyWalking team, the DBPlusEngine team has created ShardingSphere automatic monitor probe to automatically send performance data to SkyWalking. Note that automatic probe cannot be used together with DBPlusEngine plugin probe.

Metrics used to collect and display statistical indicator of cluster. DBPlusEngine supports Prometheus by default.

Challenges #

Tracing and metrics need to collect system information through event tracking. Lots of events tracking make kernel code messy, difficult to maintain, and difficult to customize extend.

Goal #

The goal of the DBPlusEngine observability module is providing as many performance and statistical indicators as possible and isolating kernel code and embedded code.

Core Concept #

Agent #

Based on bytecode enhance and plugin design to provide tracing, metrics and logging features. Enable the plugin in agent to collect data and send data to the integrated 3rd APM system.

APM #

APM is the abbreviation for application performance monitoring. It works for performance diagnosis of distributed systems, including chain demonstration, service topology analysis and so on.

Tracing #

Tracing data between distributed services or internal processes will be collected by agent. It then will be sent to APM system.

Metrics #

System statistical indicator which collected from agent. Write to time series databases periodically. 3rd party UI can display the metrics data simply.

Usage Norms #

Compile source code #

Download DBPlusEngine from GitHub,Then compile.

git clone --depth 1 https://github.com/apache/shardingsphere.git
cd shardingsphere
mvn clean install -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -DskipITs -DskipTests -Prelease

Output directory: shardingsphere-agent/shardingsphere-agent-distribution/target/apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz

Agent configuration #

  • Directory structure

    Create agent directory, and unzip agent distribution package to the directory.

mkdir agent
tar -zxvf apache-shardingsphere-${latest.release.version}-shardingsphere-agent-bin.tar.gz -C agent
cd agent
tree 
.
├── conf
│   ├── agent.yaml
│   └── logback.xml
├── plugins
│   ├── shardingsphere-agent-logging-base-${latest.release.version}.jar
│   ├── shardingsphere-agent-metrics-prometheus-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-jaeger-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-opentelemetry-${latest.release.version}.jar
│   ├── shardingsphere-agent-tracing-opentracing-${latest.release.version}.jar
│   └── shardingsphere-agent-tracing-zipkin-${latest.release.version}.jar
└── shardingsphere-agent.jar

  • Configuration file

agent.yaml is a configuration file. The plug-ins include Jaeger, opentracing, Zipkin, opentelemetry, logging and Prometheus. Remove the corresponding plug-in in ignoredpluginnames to start the plug-in.

applicationName: shardingsphere-agent
ignoredPluginNames:
  - Jaeger
  - OpenTracing
  - Zipkin
  - OpenTelemetry
  - Logging
  - Prometheus

plugins:
  Prometheus:
    host:  "localhost"
    port: 9090
    props:
      JVM_INFORMATION_COLLECTOR_ENABLED : "true"
  Jaeger:
    host: "localhost"
    port: 5775
    props:
      SERVICE_NAME: "shardingsphere-agent"
      JAEGER_SAMPLER_TYPE: "const"
      JAEGER_SAMPLER_PARAM: "1"
  Zipkin:
    host: "localhost"
    port: 9411
    props:
      SERVICE_NAME: "shardingsphere-agent"
      URL_VERSION: "/api/v2/spans"
      SAMPLER_TYPE: "const"
      SAMPLER_PARAM: "1"
  OpenTracing:
    props:
      OPENTRACING_TRACER_CLASS_NAME: "org.apache.skywalking.apm.toolkit.opentracing.SkywalkingTracer"
  OpenTelemetry:
    props:
      otel.resource.attributes: "service.name=shardingsphere-agent"
      otel.traces.exporter: "zipkin"
  Logging:
    props:
      LEVEL: "INFO"

  • Parameter description:
NameDescriptionValue rangeDefault value
JVM_INFORMATION_COLLECTOR_ENABLEDStart JVM collectortrue、falsetrue
SERVICE_NAMETracking service nameCustomshardingsphere-agent
JAEGER_SAMPLER_TYPEJaeger sample rate typeconst、probabilistic、ratelimiting、remoteconst
JAEGER_SAMPLER_PARAMJaeger sample rate parameterconst:0、1, probabilistic:0.0 - 1.0, ratelimiting: > 0, Customize the number of acquisitions per second,remote:need to customize the remote service addres,JAEGER_SAMPLER_MANAGER_HOST_PORT1(const type)
SAMPLER_TYPEZipkin sample rate typeconst、counting、ratelimiting、boundaryconst
SAMPLER_PARAMZipkin sampling rate parameterconst:0、1, counting:0.01 - 1.0, ratelimiting: > 0, boundary:0.0001 - 1.01(const type)
otel.resource.attributesopentelemetry propertiesString key value pair (, split)service.name=shardingsphere-agent
otel.traces.exporterTracing expoterzipkin、jaegerzipkin
otel.traces.samplerOpentelemetry sample rate typealways_on、always_off、traceidratioalways_on
otel.traces.sampler.argOpentelemetry sample rate parametertraceidratio:0.0 - 1.01.0

Used in DBPlusEngine-Proxy #

  • Startup script

Configure the absolute path of shardingsphere-agent.jar to the start.sh startup script of shardingsphere proxy.

nohup java ${JAVA_OPTS} ${JAVA_MEM_OPTS} \
-javaagent:/xxxxx/agent/shardingsphere-agent.jar \
-classpath ${CLASS_PATH} ${MAIN_CLASS} >> ${STDOUT_FILE} 2>&1 &
  • Launch plugin
bin/start.sh

After normal startup, you can view the startup log of the plugin in the DBPlusEngine proxy log, and you can view the data at the configured address.

Using the Agent Log Collection Function in DBPlusEngine-Driver #

Slow-Query-Log #

Background

The Slow-Query-Log feature is used to log SQL statements that take longer than a certain amount of time to execute, allowing DBAs and developers to identify potentially problematic SQL statements, and is a great source of reference for database and SQL management.

Parameter explanation

  • Slow-query-log: Whether to enable slow query logging, the default value is true.
  • Long-query-time: Slow Query Time Threshold, SQL statements that take longer than this threshold to execute will only be logged in the Slow Query Log. This is configured in milliseconds (ms) and the default value is 5000.

Prerequisite

  • The Agent uses SLF4J for log bridging output, so it requires the application to have an SLF4J dependency and the relevant log output configuration.
  • The slow-query-logging feature is based on Agent technology and requires the application to be configured with javaagent to enable the Agent at startup.

Example configuration

  • Agent Configuration conf/agent.yaml is the Agent configuration file and the default configuration is as follows.
plugins:
  logging:
    BaseLogging:
      props:
        slow-query-log: true
        long-query-time: 5000

Where slow-query-log and long-query-time are configured with default values, indicating:

  1. turn on slow-query-logging;
  2. record slow-query-log when SQL execution takes more than 5000 milliseconds.
  • Application logging configuration

Take the commonly used logback as an example of logging output and configure it as follows.

<configuration>
    <property name="log.context.name" value="project-using-DBPlusEngine-Driver" />
    <property name="log.charset" value="UTF-8" />
    <property name="log.pattern" value="[%-5level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %logger{36} - %msg%n" />
    <contextName>${log.context.name}</contextName>
    
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder charset="${log.charset}">
            <pattern>${log.pattern}</pattern>
        </encoder>
    </appender>
  
    <logger name="SLOW-QUERY" level="info" additivity="false">
        <appender-ref ref="STDOUT" />
    </logger>
    
    <root>
        <level value="INFO" />
        <appender-ref ref="STDOUT" />
    </root>
</configuration>

Description: The logger name of the slow log output is SLOW-QUERY

Slow-query-log format

db: {database name} query_time: {query time} sql_type: {sql type}

{sql}

  • db: the name of the database;
  • query_time: the time taken for SQL execution, unit in ms;
  • sql_type: type of SQL, (SELECT, INSERT, UPDATE, DELETE, OTHER other type);
  • sql: the specific SQL statement to be executed;

Example:

[WARN ] 2023-01-04 14:55:04.035 [http-nio-8888-exec-7] SLOW-QUERY - db: sharding_db query_time: 21 sql_type: SELECT
SELECT  id,user_id,uuid,status,create_time,update_time,is_deleted AS deleted  FROM t_order

General-Query-Log #

Background

General-query-log, means that when this function is enabled, the system will record all the executed SQL statements and contain information such as the database corresponding to the statement, execution time consumption, SQL type, etc., so that enterprises can easily carry out audit operations.

Parameter explanation

  • General-query-log: General-query-log has only one parameter, with a value of true enabling full logging and a value of false disabling the feature.

Prerequisite

  • The Agent uses SLF4J for log bridging output, so it requires the application to have a SLF4J dependency and the relevant log output configuration.
  • General-query-log functionality is based on Agent technology and requires the application to be configured with javaagent to enable the Agent at startup.

Example configuration

  • Agent Configuration
conf/agent.yaml is the Agent configuration file and the default configuration is as follows.
plugins:
  logging:
    BaseLogging:
      props:
        general-query-log: true

Description: general-query-log is true to enable general query logging, false to disable general query logging. The value is configured at startup.

  • Application logging configuration

Take the commonly used logback as an example of logging output and configure it as follows.

<configuration>
    <property name="log.context.name" value="project-using-DBPlusEngine-Driver" />
    <property name="log.charset" value="UTF-8" />
    <property name="log.pattern" value="[%-5level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %logger{36} - %msg%n" />
    <contextName>${log.context.name}</contextName>
    
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder charset="${log.charset}">
            <pattern>${log.pattern}</pattern>
        </encoder>
    </appender>
  
    <logger name="GENERAL-QUERY" level="info" additivity="false">
        <appender-ref ref="STDOUT" />
    </logger>
    
    <root>
        <level value="INFO" />
        <appender-ref ref="STDOUT" />
    </root>
</configuration>

Description: The logger name of the general-query-log output is GENERAL-QUERY

General-query-log format

db: {database name} query_time: {query time} sql_type: {sql type}

{sql}

  • db: database name;
  • query_time: the time taken for SQL execution, unit in ms;
  • sql_type: type of SQL, (SELECT, INSERT, UPDATE, DELETE, OTHER or other type);
  • sql: the specific SQL statement to be executed;

Example:

[INFO ] 2023-01-04 14:55:04.035 [http-nio-8888-exec-7] GENERAL-QUERY - db: sharding_db query_time: 21 sql_type: SELECT
SELECT  id,user_id,uuid,status,create_time,update_time,is_deleted AS deleted  FROM t_order