Ceph Community News (2022-7-1 to 2022-7-30)
Ceph Community News (2022-07-01 to 2022-07-30)
Quincy Ultra-large-scale Deployment Test
The Quincy test, conducted on 200 servers with a 100 Gbit/s network, aimed to evaluate the cluster management and monitoring modules. During this evaluation, two significant issues were identified and subsequently resolved: mgr and CLI response freezing caused by performance count reporting by the Ceph daemon, and peering between PGs affected by scale-out pg_autoscaler
. Optimized cephadm cluster management reduces errors and agent log traffic, and improves cluster management interaction. Prometheus needs better SSDs for large cluster management due to its inability to collect data from all nodes simultaneously.
Ceph Security Vulnerability Fixing
The following two security vulnerabilities are fixed. The v16.2.10 and v17.2.2 versions are released. You are advised to upgrade to the versions.
Upgrading Ceph cluster from Nautilus (or earlier) to a later major version were vulnerable to attacks from malicious users. This vulnerability allowed users to access any part of the CephFS file system hierarchy instead of being correctly restricted to their own child volumes.
The s3website request does not refer to a bucket. As a result, a null pointer occurs, causing RGW segfault.
In-depth Optimization of Ceph RocksDB
The default RocksDB optimization of Ceph was compared with several other configurations. We looked at how these settings affect write amplification and performance on NVMe drives, and tried to show how complex the interaction between different configuration options was. It seems that with the correct combination of options, higher performance can be achieved without significantly increasing write amplification. In some cases, enabling compression may reduce write amplification and have a moderate impact on performance in some tests. In these tests, the configuration with the highest performance usually seems to be: No Compression
bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0
LZ4 compression (The RGW load can reduce write amplification, but the bucket list performance is affected.)
bluestore_rocksdb_options = compression=kLZ4Compression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0
These non-default configurations can improve performance but have not been tested a lot. In the next few months, we'll run these configurations through QA and may consider changing the default values for the next version of Ceph.
Recently Merged PRs
Recently, PRs mainly focused on bug fixing. The main features are as follows:
- MemDB:
compressor/zlib:
- arm64 zlib uses isa-l to improve performance by 4 to 7 times. #44762
libcephsqlite
- If the regular expression '-' is used for compiling on gcc8.5.0-14, libcephsqlite breaks down. #47270.
crimson
rgw
mgr large cluster management optimization
Recent Ceph Developer News
Each module of the Ceph community holds regular meetings to discuss and align the development progress. Meeting videos are uploaded to YouTube. The major meetings are as follows:
Conference Name | Description | Frequency |
---|---|---|
Crimson SeaStore OSD Weekly Meeting | Crimson & Seastore development | Weekly |
Ceph Orchestration Meeting | Ceph management module (mgr) | Weekly |
Ceph DocUBetter Meeting | Document optimization | Biweekly |
Ceph Performance Meeting | Ceph performance optimization | Biweekly |
Ceph Developer Monthly | Ceph developer meeting | Monthly |
Ceph Testing Meeting | Version verification and release | Monthly |
Ceph Science User Group Meeting | Ceph scientific computing | Irregularly |
Ceph Leadership Team Meeting | Ceph leadership team | Weekly |
Ceph Tech Talks | Discussion on technology-related topics in the Ceph community | Monthly |
Performance Meeting
- The performance of v17.2.1 is lower than that of v17.2.0.
- Compared with the version change, the zero block detection function of BlueStore is enabled by default in v17.2.0 and disabled by default in v17.2.1.
- In-depth optimization of Ceph RocksDB
- Discussed the rgw compression. After compression is enabled, the write and deletion performance is improved, and the write traffic is lower. The read performance does not deteriorate when LZ4 is used. If the rgw daemon process is restricted to using only two cores, the performance deteriorates when the bucket list is queried, and the read, write, and deletion performance remains unchanged.
- Discussed the tombstone clearing and compression policies for RocksDB to delete data. Within a specified iteration window, if the threshold is exceeded, the tombstone is compressed.
- Discussed bluestore statfs statistics optimization for the upcoming new WAL.
Crimson SeaStore OSD Weekly Meeting
- Resolved issues related to zns support.
- GC transactions are not sourced by user behavior, so scope read operations from GC transactions do not satisfy the temporal locality principle. These extended areas should not be added to the LRU cache.