Encryption #
Background #
When the encryption and decryption configuration are used by the current user, only the names of plaintext column, ciphertext column and auxiliary query column can be configured, while the field definitions of these columns cannot be configured.
Therefore, when the user executes relevant DDL statements, the encryption column can only be created according to the definition of logical column. The type and length of the rewritten plaintext column and ciphertext column are copied according to the logical column. There are two problems with this rewriting method:
- Encryption algorithms with different field types before and after encryption cannot be supported (for example, a numeric type becomes a string type after encryption).
- For the field length, users cannot customize it. The length of ciphertext is usually greater than that of plaintext.
Here below, you can see the rewriting of encrypted columns when creating a table. We see definition of varchar(100) DEFAULT NULL and definition of user_cipher in user_id is exactly the same.
mysql> PREVIEW CREATE TABLE `t_encrypt` (`id` int(11) DEFAULT NULL, `user_id` varchar(100) DEFAULT NULL, `order_id` varchar(100) DEFAULT NULL) ENGINE=InnoDB;
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data_source_name | sql |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ds_0 | CREATE TABLE `t_encrypt` (`id` int(11) DEFAULT NULL, user_cipher varchar(100) DEFAULT NULL, user_plain varchar(100) DEFAULT NULL, order_cipher varchar(100) DEFAULT NULL) ENGINE=InnoDB |
+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In order to improve the usability of the feature, it is necessary to add field type configuration for the encrypted column, allowing flexible configuration to the users.
Data encryption provides cloud key management and encryption function, as follows:
- Cloud key management
The encrypted key plaintext configuration adopted by encrypt has potential security risks in the props file. The cloud key management function can be added to manage encrypted keys through AWS.
- Encryption
Currently, DBPlusEngine provides an encryption solution. For new tables and new businesses, you can directly use the encryption rules to configure, but for existing data tables, you need to encrypt the plaintext fields in these tables to convert them into encrypted content. At the same time, if the user needs to change the key, we also provide the corresponding decryption function, and the two cooperate to complete the key replacement.
Challenges #
In real business scenarios, relevant development teams often need to implement and maintain a set of encryption and decryption systems according to the needs of the company’s security team.
When the encryption scenario changes, the self maintained encryption system often faces the risk of rebuild or modification.
Additionally, when it comes to existing businesses, it is relatively complex to achieve seamless encryption transformation in a transparent, safe and low-risk manner without modifying business logic and SQL.
Goal #
By adding the configuration of encryption column field type definition, users can rewrite according to the configured plaintext column and ciphertext column when executing DDL statements, making the encryption and decryption feature easier to use.
Core Concept #
Logic Column #
Column name used to encrypt, it is the logical column identification in SQL.
It includes cipher column (required), query assistant column (optional) and plain column (optional).
Logical column type (datatype) #
It is used to define the types of logical columns, such as INT NOT NULL,VARCHAR(200) DEFAULT NULL etc. For details, see the definitions of various dialect fields in the official documents, such as the definition of column_definition in MySQL create statement (https://dev.mysql.com/doc/refman/8.0/en/create-table.html).
Cipher Column (cipherColumn) #
Encrypted data column.
Ciphertext Column Type (cipherdatatype) #
Used to define the type of ciphertext column, which is the same as that of logical column.
Query Assistant Column (assistedQueryColumn) #
Column used to assistant for query.
For non-idempotent encryption algorithms with higher security level, irreversible idempotent columns provided for query.
Query Assistant Column Type (ssistedQueryDataType) #
Used to define query assistant column types, the same as logical column types.
Plain Column (plainColumn) #
Column used to persist plain column, for service provided during data encryption.
Should be removed after data cleaning.
Plain Column Type (plainDataType) #
Used to define plain column types, the same as logical column types.
Encrypting (encrypting) #
Encrypt the unencrypted data in the database in batches.
Decrypting (decrypting) #
Decrypt the encrypted data in the database in batches.
Implementation #
Cloud key management #
The key is managed in the cloud, for example, the secretkey function of AWS is used to save the key, to improve the security and convenience of the entire encryption.
When initializing the encryption algorithm, the program establishes a connection with AWS to obtain the relevant key stored in AWS, and then stores the key in the algorithm. The whole data encryption process does not involve network interaction with the cloud.
Encrypting #
The encryption task is triggered by DistSQL. After receiving the request of the encryption task, the program will create the encryption task according to the current encryption rules. The encrypting task is mainly composed of two parts, one is the query task, the other is the update task. The query task is responsible for querying the user’s table data and obtaining the plaintext fields to be encrypted, and then pushing them to the channel. The update task obtains data from the channel and encrypts the update. The whole task creation and execution process will interact with the governance center, so users can query the task progress and clean up tasks through relevant DistSQL.
Usage Norms #
Supported #
- Encrypt/decrypt one or more columns in the database table.
- Compatible with all regular SQL.
Unsupported #
- You need to process the original data on stocks in the database by yourself.
- the
like
query supports %, _, but currently does not support escape. - Case insensitive queries are not supported for the encrypted fields.
- Comparison operations are not supported for encrypted fields, such as
GREATER THAN
,LESS THAN
,ORDER BY
,BETWEEN
. - Calculation operations are not supported for encrypted fields, such as
AVG
,SUM
, and computation expressions.