Parse Engine #
Compared to other programming languages, SQL is relatively simple, but it is still a complete set of programming language, so there is no essential difference between parsing SQL grammar and parsing other languages (Java, C and Go, etc.).
Abstract Syntax Tree #
The parsing process can be divided into lexical parsing and syntactic parsing. Lexical parser is used to divide SQL into indivisible atomic signs, i.e., Token. According to the dictionary provided by different database dialect, it is categorized into keyword, expression, literal value and operator. SQL is then converted into abstract syntax tree by syntactic parser.
For example, the following SQL:
SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18
Its parsing AST (Abstract Syntax Tree) is this:
To better understand, the Token of keywords in abstract syntax tree is shown in green; that of variables is shown in red; what’s to be further divided is shown in grey.
At last, through traversing the abstract syntax tree, the context needed by sharding is extracted and the place that may need to be rewritten is also marked out. Parsing context for the use of sharding includes select items, table information, sharding conditions, auto-increment primary key information, Order By information, Group By information, and pagination information (Limit, Rownum and Top). One-time SQL parsing process is irreversible, each Token is parsed according to the original order of SQL in a high performance. Considering similarities and differences between SQL of all kinds of database dialect, SQL dialect dictionaries of different types of databases are provided in the parsing module.
SQL Parser #
Features #
- Independent SQL parsing engine
- The syntax rules can be easily expanded and modified (using
ANTLR
) - Support multiple dialects
DB | Status |
---|---|
MySQL | supported |
PostgreSQL | supported |
SQLServer | supported |
Oracle | supported |
SQL92 | supported |
openGauss | supported |
- SQL format (developing)
- SQL parameterize (developing)
API Usage #
Maven config
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-sql-parser-engine</artifactId>
<version>${project.version}</version>
</dependency>
// According to the needs, introduce the parsing module of the specified dialect (take MySQL as an example), you can add all the supported dialects, or just what you need
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-sql-parser-mysql</artifactId>
<version>${project.version}</version>
</dependency>
demo:
- Get AST
/**
* databaseType type:String values: MySQL, Oracle, PostgreSQL, SQL92, SQLServer, openGauss
* sql type:String SQL to be parsed
* useCache type:boolean whether use cache
* @return parse context
*/
ParseContext parseContext = new SQLParserEngine(databaseType).parse(sql, useCache)
- GET SQLStatement
/**
* databaseType type:String values: MySQL, Oracle, PostgreSQL, SQL92, SQLServer, openGauss
* useCache type:boolean whether use cache
* @return SQLStatement
*/
ParseContext parseContext = new SQLParserEngine(databaseType).parse(sql, useCache);
SQLVisitorEngine sqlVisitorEngine = new SQLVisitorEngine(databaseType, "STATEMENT");
SQLStatement sqlStatement = sqlVisitorEngine.visit(parseContext);
- SQL Format
/**
* databaseType type:String values MySQL
* useCache type:boolean whether use cache
* @return String
*/
ParseContext parseContext = new SQLParserEngine(databaseType).parse(sql, useCache);
SQLVisitorEngine sqlVisitorEngine = new SQLVisitorEngine(databaseType, "FORMAT", new Properties());
String formatedSql = sqlVisitorEngine.visit(parseContext);
example:
sql | formatedSql |
---|---|
select a+1 as b, name n from table1 join table2 where id=1 and name=‘lu’; | SELECT a + 1 AS b, name n FROM table1 JOIN table2 WHERE id = 1 and name = ’lu'; |
select id, name, age, sex, ss, yy from table1 where id=1; | SELECT id , name , age , sex , ss , yy FROM table1 WHERE id = 1; |
select id, name, age, count(*) as n, (select id, name, age, sex from table2 where id=2) as sid, yyyy from table1 where id=1; | SELECT id , name , age , COUNT(*) AS n, ( SELECT id , name , age , sex FROM table2 WHERE id = 2 ) AS sid, yyyy FROM table1 WHERE id = 1; |
select id, name, age, sex, ss, yy from table1 where id=1 and name=1 and a=1 and b=2 and c=4 and d=3; | SELECT id , name , age , sex , ss , yy FROM table1 WHERE id = 1 and name = 1 and a = 1 and b = 2 and c = 4 and d = 3; |
ALTER TABLE t_order ADD column4 DATE, ADD column5 DATETIME, engine ss max_rows 10,min_rows 2, ADD column6 TIMESTAMP, ADD column7 TIME; | ALTER TABLE t_order ADD column4 DATE, ADD column5 DATETIME, ENGINE ss MAX_ROWS 10, MIN_ROWS 2, ADD column6 TIMESTAMP, ADD column7 TIME |
CREATE TABLE IF NOT EXISTSrunoob_tbl (runoob_id INT UNSIGNED AUTO_INCREMENT,runoob_title VARCHAR(100) NOT NULL,runoob_author VARCHAR(40) NOT NULL,runoob_test NATIONAL CHAR(40),submission_date DATE,PRIMARY KEY (runoob_id ))ENGINE=InnoDB DEFAULT CHARSET=utf8; | CREATE TABLE IF NOT EXISTS runoob_tbl (runoob_id INT UNSIGNED AUTO_INCREMENT,runoob_title VARCHAR(100) NOT NULL,runoob_author VARCHAR(40) NOT NULL,runoob_test NATIONAL CHAR(40),submission_date DATE,PRIMARY KEY ( runoob_id )) ENGINE = InnoDB DEFAULT CHARSET = utf8; |
INSERT INTO t_order_item(order_id, user_id, status, creation_date) values (1, 1, ‘insert’, ‘2017-08-08’), (2, 2, ‘insert’, ‘2017-08-08’) ON DUPLICATE KEY UPDATE status = ‘init’; | INSERT INTO t_order_item (order_id , user_id , status , creation_date) VALUES (1, 1, ‘insert’, ‘2017-08-08’), (2, 2, ‘insert’, ‘2017-08-08’) ON DUPLICATE KEY UPDATE status = ‘init’; |
INSERT INTO t_order SET order_id = 1, user_id = 1, status = convert(to_base64(aes_encrypt(1, ‘key’)) USING utf8) ON DUPLICATE KEY UPDATE status = VALUES(status); | INSERT INTO t_order SET order_id = 1, user_id = 1, status = CONVERT(to_base64(aes_encrypt(1 , ‘key’)) USING utf8) ON DUPLICATE KEY UPDATE status = VALUES(status); |
INSERT INTO t_order (order_id, user_id, status) SELECT order_id, user_id, status FROM t_order WHERE order_id = 1; | INSERT INTO t_order (order_id , user_id , status) SELECT order_id , user_id , status FROM t_order WHERE order_id = 1; |