Data Sharding #

The SphereEx-DBPlusEngine data sharding optimization process can be divided into the Standard kernel process and Federation execution engine process according to whether query optimization is performed or not.

Standard kernel flow consists of SQL parsing => SQL routing => SQL rewriting => SQL execution => result merging, which is mainly used to handle SQL execution in standard sharding scenarios.
The Federation execution engine process consists of SQL parsing => logical optimization => physical optimization => optimization execution => Standard kernel process. The Federation execution engine performs logical and physical optimization internally and relies on the Standard kernel process to route, rewrite, execute, and merge the optimized logical SQL during the optimization execution phase.

data-sharding-main-process

SQL Parsing #

It’s divided into lexical parsing and syntactic parsing. The lexical parser first breaks the SQL into non-divisible words. The syntax parser is then used to understand SQL and finally refine the parsing context. The parsing context includes tables, selections, sorting items, grouping items, aggregation functions, paging information, query conditions, and markers for placeholders that may need to be modified.

SQL Route #

Match the user-configured fragmentation policy based on the resolution context and generate a routing path. Currently, both slice routing and broadcast routing are supported.

SQL Rewrite #

Rewrites SQL to statements that can be executed correctly in the real database. SQL rewriting is divided into correctness rewriting and optimization rewriting.

SQL Execution #

Asynchronous execution via multi-threaded executor.

Result Merge #

Combine multiple execution result sets for output through a unified JDBC interface. Result merging includes stream merging, in-memory merging, and append merging using the decorator pattern.

Query Optimization #

Supported by the Federation execution engine (under development), it optimizes complex queries such as correlation queries and subqueries, and supports distributed queries across multiple database instances, using relational algebra internally to optimize the query plan and query the results through the optimal plan.