ZSoftly Cloud Platform
Back to blog

How We Built ZCP: Proxmox, Ceph, and CloudStack

A look inside ZCP's infrastructure stack. Why we chose open-source components, how they fit together, and what it means for reliability and performance.

Ditah Kumbong (Founder & CTO)
3 min read

When we set out to build ZCP, we had two constraints: everything must be open-source, and everything must run on hardware we own end to end, no hyperscaler underneath, no reseller layer in the middle. Here is how the stack fits together.

Compute: CloudStack on KVM

Apache CloudStack is the orchestration layer. It manages VM lifecycle, networking, storage attachment, and the API powering the portal and CLI.

We chose CloudStack over OpenStack for three reasons:

  1. Operational simplicity. CloudStack is a single management server with a MySQL backend. OpenStack requires dozens of services, each with its own database, message queue, and failure modes.
  2. Proven multi-tenancy. CloudStack was designed for public cloud providers from day one. Account isolation, resource quotas, and usage metering are built in.
  3. Stable API surface. The CloudStack API has been stable for over a decade. We build automation against it without chasing breaking changes.

Underneath CloudStack, KVM handles virtualization. Every VM gets dedicated vCPU and RAM. No overcommit on production plans.

Storage: Ceph

Ceph is a unified storage platform. One cluster serves block, object, and shared file storage through different access protocols. This means we operate a single storage layer instead of three separate systems.

Storage classes

We run three storage classes to match performance and cost requirements:

ClassMediaUse case
FastNVMe SSDVM root disks, databases, latency-sensitive workloads
FlowSATA SSDGeneral-purpose block volumes, application data
BulkHDDArchival, backups, large datasets where throughput matters more than IOPS

Ceph’s CRUSH rules place data on the correct media class automatically. You choose the tier when you create a volume.

Storage services

  • Block storage (RBD): Persistent volumes attached directly to VMs. Available in all three storage classes. Expand online, snapshot on demand.
  • Object storage (RGW): S3-compatible API for unstructured data. Works with AWS CLI, boto3, rclone, and any S3-aware tool.
  • Shared file storage (CephFS): NFS and SMB file systems mounted to multiple VMs simultaneously. Useful for shared application data, media assets, and workloads needing concurrent access from multiple hosts.

Replication and durability

Triple replication means every block of data exists on three separate physical disks across at least two servers. A single disk failure or server outage does not cause data loss. Ceph self-heals. When a disk fails, the cluster re-replicates the affected data to restore the configured replica count.

Networking: Open vSwitch

Each tenant gets isolated VPC networking via Open vSwitch (OVS). VMs within a VPC communicate over VXLAN tunnels. Traffic between VPCs is blocked by default.

Public IP addresses are routed through our border routers with DDoS mitigation at the network edge. Firewall rules are applied per-VM and per-network, managed through the portal or API.

The MTU is set to 1500 on tenant interfaces with TCPMSS clamping to avoid fragmentation on the VXLAN overlay.

Monitoring: Prometheus and Grafana

Every host, VM, and service exports metrics to Prometheus. Grafana dashboards give us real-time visibility into CPU, memory, disk I/O, and network throughput across the cluster.

Loki handles log aggregation. Alertmanager pages the on-call engineer when thresholds are breached.

This same observability stack is available to customers on the Observability service tier, running dedicated Prometheus and Grafana instances per tenant.

What this means for you

The stack is boring on purpose. These are mature, well-understood components with large communities and extensive documentation. No proprietary black boxes, no vendor-specific agents.

If you want to migrate away from ZCP, your VMs are standard QCOW2 images. Your object storage speaks S3. Your block volumes are standard disk devices. Nothing locks you in.

Open, auditable, and portable. The ZCP infrastructure philosophy.