[Innovative openEuler Projects] Enabling Distributed Database Storage Through the Cantian Engine

openEuler2023-10-11Cantian engine

Introduction to the Cantian Engine

The Cantian engine is a middleware that transforms a single-node database into a database with capabilities similar to Oracle Real Application Cluster (Oracle RAC). The Cantian engine adopts an innovative decoupled storage and compute architecture and uses key technologies such as distributed cache, transaction multi-version concurrency control (MVCC), and high availability (HA) in a multi-active cluster to enable a distributed database and build a distributed database storage architecture featuring write many read many (WMRM). This project has been open-sourced in the openEuler community.

Code repository: https://gitee.com/openeuler/cantian

Cantian Engine Architecture

The Cantian engine consists of the following components:

Cantian Connector: supports the Cantian engine as a distributed database storage engine plugin, database functions such as data definition language (DDL), data manipulation language (DML), and transaction, and is compatible with ecosystem applications of distributed databases.

Cluster manager server (CMS): manages clusters.

WMRM module: Based on shared storage, the Cantian engine can be deployed to enable a write-many cluster where nodes are equivalent in architecture. The WMRM module ensures that DDL, DML, and DCL operations can be performed on the database from any node. Modifications made on any node are noticeable on other nodes if the isolation level requirements are met. All computing nodes share, read, and write the same user data on storage nodes.

Backup and recovery module: exports all tables in a database as SQL statements or table texts, and imports logical data files in text format to the database during logical recovery.

Functions of the Cantian Engine

  1. Based on shared storage, the Cantian engine can be deployed to enable a write-many cluster where nodes are equivalent in architecture. It ensures that DDL, DML, and DCL operations can be performed on the database from any node.

  2. As a distributed database storage engine plugin, the Cantian engine supports database functions such as DDL, DML, and transaction, and is compatible with ecosystem applications of distributed databases.

  3. The Cantian engine can manage the cluster, including cluster member status maintenance, cluster exception handling, and arbitration.

  4. During backup and recovery, the Cantian engine exports all tables in a database as SQL statements or table texts. During logical recovery, it imports logical data files in text format to the database.

Application Scenarios

Based on shared storage, the Cantian engine implements decoupled storage and compute, and WMRM architecture innovation for distributed databases. After the Cantian engine is deployed, the performance is improved by 10 times, database and table partitioning is not required, and failover can be completed within 30 seconds upon a node fault, reducing the TCO by over 30%.

Optimizations Provided by the Cantian Engine

  1. WMRM: In a write-many cluster, the Cantian engine extends single-node operations to the multi-node cluster using the following cluster components:
  • Resource distribution management: manages distributed metadata information, including the owner, permission, and concurrency control of resources such as pages and locks in the cluster.

  • Resource cache convergence: manages page concurrency control in a cluster, including concurrent page read/write control, processing of global page read/write requests, page owner and read-only copy management, transmission of latest pages in the cluster, and invalidation of earlier pages.

  • Lock distribution and management: controls lock concurrency in a cluster and distributes lock resources such as spin locks and latches to the entire cluster.

  1. Cluster management: The cluster management service monitors the status of resources on the local node and global resources. This service on each node is connected through heartbeat messages. It can detect and rectify Cantian process, node, network, storage link, and storage faults.

  2. Ecosystem adaptation: The Cantian engine connects to the distributed database through the Cantian Connector. The Cantian Connector plugin receives a request from the MySQL SQL engine for invoking the storage engine plugin and forwards the request to the Cantian engine kernel through the communication module and logic at the interconnection layer. The Cantian Connector plugin and Cantian engine communication module are designed as unified interfaces and can be dynamically replaced.

  3. Backup and recovery: The Cantian engine exports all tables in a database as SQL statements or table texts. During logical recovery, it imports the logical data files in text format to the database.

Follow-up Planning

This project has been open-sourced in the openEuler community. The Cantian engine will accommodate more functions such as the following:

  1. Active-active disaster recovery

  2. MySQL log parsing interface

  3. Data import tool

  4. Multi-tenant interface

  5. Data Transmission Service (DTS) database migration tool

  6. O&M tool integration

If this project piques your interest, you are cordially invited to participate. For details, you can visit the code repository: https://gitee.com/openeuler/cantian


[Disclaimer] This article only represents the author's opinions, and is irrelevant to this website. This website is neutral in terms of the statements and opinions in this article, and does not provide any express or implied warranty of accuracy, reliability, or completeness of the contents contained therein. This article is for readers' reference only, and all legal responsibilities arising therefrom are borne by the reader himself.