Kernel Processing Flow #

The SphereEx-DBPlusEngine kernel process includes standard SQL Parser and SQL Binder modules, which are used for the identification of SQL-specific features and, depending on the result, the SQL execution process is divided into the Simple Push Down Engine and the SQL Federation Engine.

Kernel Processing Flow

In the microkernel process, SQL Parser is responsible for transforming user-input SQL into an AST (Abstract Syntax Tree) Node through the standard process of Lexer and Parser, and ultimately into a SQLStatement that extracts the necessary features, which is the core input to the Apache ShardingSphere kernel processing.

SQLStatement is a representation of the original SQL, while SQL Binder combines Metadata and SQLStatement to supplement SQL with wildcards and missing parts to produce a complete AST Node that matches the database table structure.

SQL Binder analyzes the dependencies between the logical table and the physical database to determine if SQL request has the potential to operate across multiple data sources. When SQL can be pushed down to the database storage node in full after logical table modification and execution information completion, the Simple Push Down Engine is used to ensure the maximum compatibility of SQL.

Conversely, when SQL involves cross-database association and cross-database subquery, SQL Federation Engine is used to obtain more performance in distributed table association. SQL Federation Engine is used when SQL involves cross-library correlation and cross-library subqueries to achieve better performance in distributed table association operations.

Simple Push Down Push-down Process #

Simple Push Down Push-down Process

The Simple Push Down process consists of SQL Parsing => SQL Binding => SQL Routing => SQL Rewriting => SQL Execution => Result Consolidation and is mainly used to handle SQL execution in standard sharding scenarios.

After the standard SQL Parser and SQL Binder pre-processes, SQL Router extracts the key fields (e.g. sharding keys) from the SQLStatement and matches them with the specific rules configured by the user through DistSQL to calculate the final routed data source.

Once the data source is known, SQL Rewriter is responsible for rewriting SQL into SQL that can be directly pushed down to the database for execution in a distributed scenario, such as logical table name substitution, complementary columns, aggregate function correction, etc.

SQL Executor selects the appropriate execution engine based on the transaction status of the current request and sends the rewritten SQL to the data source pointed by the routing result in a concurrent and grouped manner. Ultimately, when all the results of SQL execution are complete, Result Merger automatically aggregates and merges or further rewrites the multiple result sets.

SQL Federation Execution Engine Process #

SQL Federation Execution Engine Process

The main difference between the SQL Federation Engine and Simple Push Down Engine is SQL Optimizer. The SQL Optimizer takes the AST Node and optimizes it with both RBO (Rule Based Optimization) and CBO (Cost Based Optimization) to generate a Query Plan Tree.

Instead of fetching data from the storage node via the original SQL, the Query Plan Tree regenerates SQL that can be executed on a single data node and, based on the routing results, sends it to the storage node.

Before generating the final SQL, the Query Plan Tree Rewriter will modify the tables or columns in it according to the rules configured by the user via DistSQL.

It is important to note that the target data source does not have to be the same as SQL dialect entered by the user and a new SQL dialect can be generated for the storage node via the Query Plan Tree, e.g. other database dialects, even Key-Values etc.

The results of the execution are returned to the calculation node of SphereEx-DBPlusEngine and the final aggregation calculation is performed in memory via Operator Calculator until the result set is finally returned.