Logo
Sort

Sort #

Principle #

The sorting function of DBPlusEngine depends on the underlying sorting function of the database. When a user inputs an SQL statement that includes an ORDER BY, DBPlusEngine will route the logical SQL to the corresponding storage data source and rewrite the logical SQL into real SQL.

Since the real SQL also includes ORDER BY, the returned result set is ordered. Therefore, when merging the results of multiple data sources, DBPlusEngine uses the order of each result set to maintain an ordered queue and achieve global sorting.

Recommendation #

Set the max-connections-size-per-query parameter reasonably to ensure that DBPlusEngine can allocate at least one connection to hold the result set for each shard during execution, and avoid converting streaming result sets into memory result sets that occupy too much memory; please refer to the Connection Mode for details.

Example #

For example, when querying the t_score table below, the logical SQL is sorted by the score field, and the result sets of each shard returned are sorted in descending order by score. The data values pointed to by the current cursor for each of the 3 result sets are sorted and placed into a priority queue, with the first data value of t_score_0 being the largest, the first data value of t_score_2 the next largest, and the first data value of t_score_1 the smallest, so the priority queue is based on the t_score_0, t_score_2, and t_score_1’s. The following diagram shows how sorting the queue is done when the next call is made:

Sort 1

The following diagram shows how the sorting and merging are done when the next call is made. As we can see from the diagram, when the first NEXT call is made, t_score_0, which is at the top of the queue, is ejected from the queue and returns the data value (i.e., 100) that the current cursor points to the querying client, and is placed back into the priority queue after moving the cursor down one place.

The priority queue is also ranked according to the data value (in this case 90) that the current result set of t_score_0 points to for the cursor, with t_score_0 ranked last in the queue according to the current value. The result set of t_score_2, which was previously ranked second in the queue, is automatically ranked first in the queue.

Sort 2

As you can see when the data in each result set is ordered, but the whole result set is disordered, Sharding-Sphere can sort the data without adding all the data to memory. It uses stream merge, each time only one correct data is obtained by the next, which greatly saves memory consumption.