TSF – Giải pháp IT toàn diện cho doanh nghiệp SMB | HCM

P24 - High Availability with Ceph On Proxmox 9 Failover Test

Proxmox P24 – High Availability with Ceph Failover Test

Step-by-Step Ceph HA Configuration on Proxmox VE 9

High Availability (HA) combined with Ceph storage is one of the most powerful features in Proxmox VE 9. In this tutorial, we demonstrate how to build a resilient Proxmox cluster using Ceph distributed storage and perform a real failover test when a node goes down.

You will learn how to configure a 3-node Proxmox cluster, install Ceph, create OSDs and pools, move VM disks to Ceph storage, configure HA resource rules, and simulate node failure.

If you manage production workloads, enterprise infrastructure, or serious home labs, understanding Proxmox HA with Ceph is essential for minimizing downtime and ensuring data integrity.


3.1️⃣ Preparation

Before configuring High Availability with Ceph, proper preparation is required.

🔧 Lab Infrastructure

Prepare 3 Proxmox nodes.
Each node has 3 disks:

• 1 disk → OS (Proxmox VE)
• 2 disks → Ceph OSD

Configuration:

• Pve01: Disk 2,3: 30Gb : 192.168.16.200
• Pve02: Disk 2,3: 40Gb : 192.168.16.201
• Pve03: Disk 2,3: 45Gb : 192.168.16.202


🔹 Step 1 — Set Disk Serial (If Using Proxmox VM)

If installing inside Proxmox VM, set disk serial manually (physical disks already have serial).

 
nano /etc/pve/qemu-server/102.conf serial=DISK05 serial=DISK06
 
nano /etc/pve/qemu-server/103.conf serial=DISK03 serial=DISK04
 
nano /etc/pve/qemu-server/104.conf serial=DISK01 serial=DISK02

🔹 Step 2 — Prepare Windows 10 VM on PVE01

PVE01 node contains Windows 10 VM for HA testing.


🔹 Step 3 — Ensure Same Time on All Nodes

Cluster nodes must have synchronized time.

 
timedatectl status

🔹 Step 4 — Verify Disks Before Creating Ceph OSD

List disks carefully to avoid accidental deletion:

 
lsblk fdisk -l

Always double-check disk identity before creating OSD.


3.2️⃣ Install Ceph


🔵 Step 1 — Create 3-Node Cluster

On Pve01:

 
pvecm create tsf

Get IP info of pve01 and paste into hosts file of pve02 and pve03:

 
192.168.16.200 pve01zfs.tsf.id.vn pve01zfs

On pve02 and pve03:

 
pvecm add pve01zfs.tsf.id.vn

Cluster must be fully healthy before proceeding.


🔵 Step 2 — Install Ceph

In GUI:

Datacenter → Ceph → Install Ceph

Repeat installation on the remaining two PVE nodes.

All nodes must run the same Ceph version.


🔵 Step 3 — Create Ceph MON

Add MON (Monitor).
Add Manager (Administrator).

Ceph MON ensures cluster state consistency.


🔵 Step 4 — Create Ceph OSD

Create OSD on each node using prepared disks.

Each node contributes storage to the Ceph distributed system.


🔵 Step 5 — Create Ceph Pool

Create the pool once on one node only.

The pool will automatically be available cluster-wide.


3.3️⃣ Create Ceph HA Configuration

Now we integrate Ceph storage with Proxmox HA.


🔹 Step 1 — Move VM Disk to Ceph Storage

Move Windows VM disk to Ceph pool.

👉 Important Notes:

• Moving disk loses optional capacity over time.
• VM can remain powered ON during move (online move supported).

Ceph shared storage allows VM to run on any node without disk replication delay.


🔹 Step 2 — Add VM to HA Manager

Add HA resource.

Add HA preference rule.

HA resource: select VM HA

Priority:

• pve01 = 3
• pve02 = 2
• pve03 = 1

This ensures VM prefers running on pve01 but can failover to other nodes automatically.


3.4️⃣ Simulate HA Failover Test

Now perform real failover testing.

Down pve01

When pve01 is offline:

→ HA manager detects node failure
→ VM automatically starts on pve02 (based on priority rule)
→ Ceph ensures disk availability across cluster
→ No data loss

Once pve01 returns:

→ VM can migrate back depending on HA policy

This demonstrates true High Availability using shared distributed storage.


🔐 Why Proxmox HA with Ceph Is Powerful

Using Ceph instead of ZFS replication provides:

• True shared storage
• Immediate failover (no snapshot promotion required)
• No dependency on scheduled replication
• Real-time distributed data consistency
• Higher availability in production environments

Ceph distributes data across multiple nodes and replicates blocks automatically, ensuring redundancy and integrity.


🚀 Final Thoughts

Proxmox VE 9 combined with Ceph storage delivers enterprise-grade High Availability without expensive licensing costs. By configuring a proper 3-node cluster, installing Ceph MON and OSD correctly, and setting HA priority rules, you can build a fully resilient virtualization environment.

This architecture is ideal for:

  • Enterprise infrastructure

  • Critical application hosting

  • Virtualized production workloads

  • Advanced home labs

  • IT professionals preparing for real-world deployment

Mastering Proxmox HA with Ceph significantly enhances your virtualization expertise and prepares you for advanced infrastructure management.

See also related articles

P15 – Backup and Restore VM in Proxmox VE

P15 – Backup and Restore VM in Proxmox VE 🚀 Proxmox VE P15 – Backup and Restore VMs (Full Step-by-Step Guide) Data protection is one of the most critical responsibilities of any system administrator.In Proxmox VE, having a proper backup and restore strategy ensures your infrastructure can quickly recover from...

Read More

P14 – How to Remove Cluster Group Safely on Proxmox

Proxmox VE 9 P14: How to Remove Cluster Group Safely In Proxmox (Step-by-Step Guide) 🚀 Proxmox VE 9 – How to Remove Cluster Group (Step-by-Step) In some scenarios, you may need to remove a Proxmox cluster configuration completely, especially when: ❌ A node failed permanently ❌ The cluster was misconfigured...

Read More