Analysis of IO Commit Latency Spike in Ceph Cluster

Symptom Environment: After abnormal node reboot in Ceph cluster Affected Metric: Prometheus rate value of ceph_osd_op_w_latency Behavior: Pre-reboot: Values showed normal increment (peak ~1M) Post-reboot: Started recording from 0 Spiked to 4.2B after 3 minutes (close to 2³²) Investigation Process Phase 1: Initial Hypotheses Hypothesis Verification Method Conclusion Prometheus calculation Reviewed rate() function Confirmed proper counter reset Ceph stat initialization Inspected OSD.cc init code Verified proper atomic init Phase 2: Deep Analysis Key Findings:

3FS的一点思考

横空出世: 3FS 近两个月,DeepSeek的热度值爆表。尤其令我们存储人欣喜的是,DeepSeek竟然把一向养在深闺无人识的分布式存储推到了台前

Thinking about 3FS

Rise to Prominence: 3FS In the past two months, DeepSeek’s popularity has skyrocketed. What particularly delights us storage professionals is that DeepSeek has brought distributed storage, which has long been hidden away, to the forefront. The GitHub attention that 3FS has received is unprecedented among open-source distributed storage projects, and there probably won’t be another like it. The star count in just 3 days after open-sourcing exceeded that of numerous open-source storage projects that have been cultivating for years.

Ceph handle_cap_grant机制解析

handle_cap_grant handle_cap_grant 是 Ceph 分布式一致性机制的核心实现,负责: 处理 MDS 发送的能力授权/撤销消息 维护客户端缓存一致性 管理文件元数据同步 协调分布式锁机制 完整的流程图 1