Table Reference #
Principle #
DBPlusEngine table reference currently supports reference queries for scenarios where multiple partitioned tables are bound tables, and reference queries between broadcast tables and partitioned tables, broadcast tables and single tables. It does not support reference queries between multiple partitioned tables for unbound tables, and cross-database reference queries between partitioned tables and single tables, single tables and single tables.
When DBPlusEngine processes table reference queries, it pushes down the reference queries to the underlying database for execution as much as possible, fully using the computational power of the native database. DBPlusEngine will consolidate the result sets returned from the underlying database so that the final results satisfy the reference query semantics of logical SQL.
Recommendations #
- When using bound tables for multi-table correlation queries, the Cartesian product correlation can be avoided by using split keys for correlation, thus improving query efficiency.
- For the scenario of reference between the sliced table and other tables, the large table can be sliced according to the business scenario, and the small table to be associated can be broadcast to each data source. When the sliced table and the broadcasted table need to be associated for the query, the broadcasted table on the current data source can be selected for the reference.
- For the small tables that are used separately in business scenarios and do not need to be associated with the sliced table, they can be stored on the specified data source and updated and queried in the form of a single table.
Example #
Let’s take the following SQL as an example to compare the difference between using a bound table and an unbound table reference.
SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
- Without configuring binding table relationships, assuming that the slice key
order_id
routes the value 10 to slice 0 and the value 11 to slice 1, then the routed SQL should be 4 items, which are presented as a Cartesian product.
SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
- After configuring the bound table relationship and using
order_id
for the reference, the routed SQL should be 2 items:
SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
You can see that the efficiency of the query is greatly improved after configuring the bound table relationship.