TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

P25 - Ceph HA Cluster Replace Failed Node on Proxmox

🚀 Proxmox – P25 Ceph HA Cluster: Replace Failed Node on Proxmox (Full Demo)

🔎 Introduction

In this tutorial, we demonstrate how to replace a failed node in a Proxmox Ceph High Availability (HA) cluster step by step. When a Proxmox node crashes or becomes unreachable, your Ceph cluster may enter a degraded state. However, thanks to Ceph replication and HA mechanisms, your virtual machines can continue running without downtime — if the cluster is properly configured.

This guide shows you how to:

  • Safely remove a dead Proxmox node

  • Cleanly remove MON and OSD services from Ceph

  • Update the CRUSH map properly

  • Add a replacement node into the cluster

  • Reinstall Ceph services on the new node

  • Rebalance data automatically

  • Restore HA functionality

This full demo is ideal for IT professionals managing production environments and home lab enthusiasts learning Proxmox PVE 9 with Ceph HA.


🧪 5. Simulate a Dead Node

Due to limited lab equipment, the VM built for testing runs slowly.
This demo focuses on explaining each replacement step clearly.

In real-world production environments, physical Proxmox servers will perform significantly faster.


⚠️ 5.1 Symptoms

When a node fails (example: pve01zfs):

  • Multiple OSDs appear down

  • Ceph reports OSDs as down/out

  • If replication factor is sufficient (e.g. 3), VMs continue running on remaining nodes

  • Cluster health becomes degraded


🛠 5.2 Troubleshooting Procedure


🔹 Step 1: Delete the Dead Node

Remove node from cluster:

 
pvecm delnode pve01zfs

Delete leftover configuration files:

 
rm -rf /etc/pve/nodes/pve01zfs

🔹 Step 2: Remove MON pve01 from Ceph

First mark MON as down:

 
ceph mon down pve01zfs

Then remove it:

 
ceph mon remove pve01zfs

This completely removes the MON service from the Ceph cluster.


🔹 Step 3: Delete OSD pve01zfs

Identify OSD IDs:

 
ceph osd tree

Example:

  • osd.0

  • osd.1

Mark OSD as down:

 
ceph osd down osd.0 ceph osd down osd.1

Mark OSD as out:

 
ceph osd out osd.0 ceph osd out osd.1

Remove from CRUSH map:

 
ceph osd crush remove osd.0 ceph osd crush remove osd.1

Remove authentication:

 
ceph auth del osd.0 ceph auth del osd.1

Remove OSD completely:

 
ceph osd rm osd.0 ceph osd rm osd.1

🔹 Step 4: Remove Host from CRUSH Map

Run exactly one command:

 
ceph osd crush remove pve01zfs

If unsure about the hostname:

 
ceph osd tree

Then restart Ceph services on remaining nodes.

Ceph will redistribute data to remaining OSDs.
Rebalancing speed depends on disk performance and network bandwidth.
(Lab environment will be slower.)


🆕 Step 5: Prepare Replacement Node (pve04zfs)

Edit disk serial configuration:

 
nano /etc/pve/qemu-server/105.conf serial=DISK07 serial=DISK08

Disable enterprise repository.

Set IP in same network class as pve02 & pve03.

Update hosts file:

 
nano /etc/hosts 192.168.16.201 pve02zfs.tsf.id.vn pve02zfs 192.168.16.202 pve03zfs.tsf.id.vn pve03zfs

Check disks:

 
lsblk ls -l /dev/disk/by-id/

🔗 Step 6: Join pve04 into Cluster

 
pvecm add pve02zfs.tsf.id.vn

💾 Step 7: Install Ceph on New Node (pve04)

From GUI (Node pve04):

  • Ceph → Install Ceph

  • Select same Ceph version

  • Reboot if required

Then add services:

➤ Add MON + MGR

  1. Ceph → Monitor → Add

  2. Ceph → Manager → Add

➤ Add OSD

  • Ceph → OSD → Create OSD

  • Select /dev/sdb or empty disk

  • Repeat as needed


⚖️ Step 8: Rebalance Ceph

When the new node joins, Ceph automatically rebalances data.

Check cluster status:

 
ceph -s

Healthy state:

 
HEALTH_OK

Note:
In small lab environments, you may see:

  • slow IO warnings

  • BlueStore slow operations

Data redistribution takes time depending on disk speed.

Degraded data redundancy will gradually decrease until cluster becomes fully active + clean.


🏷 Step 9: Add New Node to HA Group

Navigate:

Datacenter → HA → Groups → Select Group → Add pve04

Now HA can use the new node for failover operations.


✅ Final Thoughts

Replacing a failed node in a Proxmox Ceph HA cluster requires proper order:

  1. Remove node from cluster

  2. Clean MON & OSD services

  3. Update CRUSH map

  4. Add replacement node

  5. Reinstall Ceph

  6. Allow automatic rebalancing

  7. Reconfigure HA

By following best practices, you can maintain data integrity, minimize downtime, and ensure business continuity in both production and lab environments.

This tutorial demonstrates how Ceph replication and Proxmox HA work together to provide true high availability infrastructure.

See also related articles

P15 – Backup and Restore VM in Proxmox VE

P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...

Read More

P14 – How to Remove Cluster Group Safely on Proxmox

Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...

Read More