Ceph Monitor Architecture Analysis

Monitor Overall Architecture Overview

Core Functional Positioning

Ceph Monitor serves as the control plane of the cluster, primarily responsible for the following core duties:

  • Cluster Map Maintenance: Managing key mapping information including MonitorMap, OSDMap, CRUSHMap, MDSMap, PGMap, etc.
  • Status Monitoring & Health Checks: Real-time monitoring of cluster status and generating health reports
  • Distributed Consistency Guarantee: Ensuring cluster metadata consistency across all nodes based on Paxos algorithm
  • Authentication & Authorization: Managing CephX authentication system and user permissions
  • Election & Arbitration: Maintaining Monitor quorum and handling failure recovery

Monitor Architecture Diagram

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
graph TB
    subgraph "Ceph Monitor Core Architecture"
        A[Monitor Daemon] --> B[MonitorStore]
        A --> C[Paxos Engine]
        A --> D[Election Module]
        A --> E[Health Module]
        A --> F[Config Module]
        A --> G[Auth Module]
        
        B --> B1[ClusterMap Storage]
        B --> B2[Configuration DB]
        B --> B3[Transaction Log]
        
        C --> C1[Proposal Processing]
        C --> C2[Leader Election]
        C --> C3[Consensus Coordination]
        
        D --> D1[Connectivity Strategy]
        D --> D2[Quorum Management]
        D --> D3[Split-brain Prevention]
        
        E --> E1[Health Checks]
        E --> E2[Status Reporting]
        E --> E3[Alert Generation]
        
        F --> F1[Config Key-Value Store]
        F --> F2[Runtime Configuration]
        F --> F3[Config Distribution]
        
        G --> G1[CephX Authentication]
        G --> G2[User Management]
        G --> G3[Capability Control]
    end
    
    subgraph "External Interactions"
        H[OSD Daemons] --> A
        I[MDS Daemons] --> A
        J[Client Applications] --> A
        K[Admin Tools] --> A
        L[Dashboard/Grafana] --> A
    end

Monitor Core Submodule Analysis

MonitorStore Storage Engine

Functional Overview: MonitorStore is the persistent storage engine of Monitor, implemented based on RocksDB, responsible for storing all critical cluster metadata.

Core Architecture:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
graph LR
    subgraph "MonitorStore Architecture"
        A[MonitorStore Interface] --> B[RocksDB Backend]
        A --> C[Transaction Manager]
        A --> D[Snapshot Management]
        
        B --> B1[Cluster Maps]
        B --> B2[Configuration Data]
        B --> B3[Authentication Info]
        B --> B4[Health Records]
        
        C --> C1[ACID Transactions]
        C --> C2[Batch Operations]
        C --> C3[Rollback Support]
        
        D --> D1[Consistent Snapshots]
        D --> D2[Backup/Restore]
        D --> D3[Recovery Points]
    end

Paxos Consensus Engine

Functional Overview: The Paxos engine is the most core module of Monitor, implementing distributed consensus algorithms to ensure cluster metadata consistency across all Monitor nodes.

Architecture Design:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
graph TB
    subgraph "Paxos Consensus Engine"
        A[Paxos Service] --> B[Proposer Role]
        A --> C[Acceptor Role] 
        A --> D[Learner Role]
        
        B --> B1[Proposal Generation]
        B --> B2[Phase 1: Prepare]
        B --> B3[Phase 2: Accept]
        
        C --> C1[Promise Handling]
        C --> C2[Accept Requests]
        C --> C3[Vote Management]
        
        D --> D1[Value Learning]
        D --> D2[State Synchronization]
        D --> D3[Catchup Mechanism]
        
        E[Leader Election] --> F[Quorum Management]
        E --> G[Term Management]
        
        F --> F1[Majority Calculation]
        F --> F2[Split-brain Prevention]
        F --> F3[Network Partition Handling]
    end

Core Operating Mechanism:

  1. Proposal Phase:
1
2
# Check current quorum status
ceph quorum_status
  1. Consistency Guarantee Process:
    • Phase 1 (Prepare Phase): Leader sends Prepare requests to all Acceptors
    • Phase 2 (Accept Phase): Collects majority responses and sends Accept requests
    • Commit Phase: After reaching consensus, broadcasts commit messages to all nodes

Election & Arbitration Module

Functional Overview: Responsible for Monitor cluster Leader election, quorum maintenance, and failure detection, ensuring cluster availability under various network conditions.

Election Strategy Architecture:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
graph LR
    subgraph "Election & Quorum Module"
        A[Election Manager] --> B[Connectivity Strategy]
        A --> C[Classic Strategy]
        A --> D[Disallowed Strategy]
        
        B --> B1[Network Reachability]
        B --> B2[Latency Measurement]
        B --> B3[Bandwidth Testing]
        
        C --> C1[Monitor Rank Based]
        C --> C2[Simple Majority]
        C --> C3[Static Priority]
        
        E[Quorum Management] --> F[Split-brain Detection]
        E --> G[Failure Detection]
        E --> H[Recovery Coordination]
        
        F --> F1[Network Partition Handling]
        F --> F2[Minority Protection]
        F --> F3[Quorum Loss Recovery]
    end

Key Configuration Parameters:

1
2
3
4
5
6
7
# Monitor election related configuration
[mon]
mon_election_timeout = 5
mon_lease = 5
mon_lease_renew_interval_factor = 0.6
mon_lease_ack_timeout_factor = 2.0
mon_accept_timeout_factor = 2.0

Election Trigger Conditions:

  • Monitor node startup or restart
  • Network partition or connection disconnection
  • Leader node failure or unresponsiveness
  • Manual election trigger (operational intervention)

Health Monitoring Module

Functional Overview: Real-time monitoring of cluster component health status, generating alert information, and providing detailed diagnostic data.

Monitoring Architecture:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
graph TB
    subgraph "Health Monitoring Module"
        A[Health Service] --> B[Health Checks]
        A --> C[Status Aggregation]
        A --> D[Alert Management]
        
        B --> B1[OSD Health]
        B --> B2[PG Status]
        B --> B3[Cluster Capacity]
        B --> B4[Network Connectivity]
        B --> B5[Daemon Liveness]
        
        C --> C1[Global Status]
        C --> C2[Component Status]
        C --> C3[Trend Analysis]
        
        D --> D1[Warning Generation]
        D --> D2[Error Classification]
        D --> D3[Recovery Suggestions]
        
        E[Metrics Collection] --> F[Performance Data]
        E --> G[Resource Usage]
        E --> H[I/O Statistics]
    end

Health Check Command Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Get overall cluster status
ceph status
ceph -s

# View detailed health information
ceph health detail

# Monitor cluster status changes
ceph -w

# View specific component status
ceph pg stat
ceph osd stat
ceph mon stat

Health Status Classification:

  • HEALTH_OK: Cluster running normally
  • HEALTH_WARN: Warnings exist but don’t affect data safety
  • HEALTH_ERR: Errors exist requiring immediate attention

Configuration Management Module

Functional Overview: Managing cluster and daemon configuration parameters, supporting runtime configuration updates and distribution.

Configuration Management Architecture:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
graph LR
    subgraph "Configuration Management"
        A[Config Service] --> B[Key-Value Store]
        A --> C[Config Distribution]
        A --> D[Runtime Updates]
        
        B --> B1[Global Settings]
        B --> B2[Per-Daemon Config]
        B --> B3[User Preferences]
        
        C --> C1[Push Mechanism]
        C --> C2[Pull Mechanism]
        C --> C3[Change Notifications]
        
        D --> D1[Hot Reconfiguration]
        D --> D2[Restart Required Flags]
        D --> D3[Validation Checks]
        
        E[Config Sources] --> F[ceph.conf File]
        E --> G[Command Line]
        E --> H[Monitor Database]
        E --> I[Environment Variables]
    end

Configuration Operation Commands:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Set global configuration
ceph config set global public_network 192.168.160.0/24

# View specific daemon configuration
ceph config show osd.0
ceph config show-with-defaults osd.0

# Set configuration key-value
ceph config-key set <key> <value>

# Generate minimal configuration file
ceph config generate-minimal-conf > /etc/ceph/ceph.conf

Monitor Interaction Relationships with Other Components

Monitor-OSD Interaction

Interaction Mode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sequenceDiagram
    participant OSD as OSD Daemon
    participant MON as Monitor
    participant CRUSH as CRUSH Map
    
    OSD->>MON: Heartbeat & Status Report
    MON->>OSD: OSD Map Updates
    OSD->>MON: PG Status Updates
    MON->>CRUSH: Update CRUSH Rules
    MON->>OSD: New CRUSH Map
    OSD->>MON: Confirm Map Epoch

Key Interaction Content:

  • Heartbeat Detection: OSDs periodically report their alive status to Monitor
  • Status Updates: PG status, capacity information, performance metrics
  • Map Distribution: OSD Map, CRUSH Map update notifications
  • Failure Handling: OSD failure detection and marking

Monitor-MDS Interaction

Interaction Architecture:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sequenceDiagram
    participant MDS as MDS Daemon
    participant MON as Monitor
    participant FS as FileSystem
    
    MDS->>MON: Register & Get MDS Map
    MON->>MDS: Assign MDS Rank
    MDS->>MON: Status & Metadata Updates
    MON->>MDS: FileSystem Configuration
    MDS->>MON: Load Balancing Metrics
    MON->>MDS: Rank Reassignment