A Kubernetes operator for managing virtual machines across multiple hypervisors.
Version: v0.3.11 — CHANGELOG | Documentation
VirtRigaud is a Kubernetes operator that enables declarative management of virtual machines across different hypervisor platforms. It provides a unified API for provisioning and managing VMs on vSphere, Libvirt/KVM, and Proxmox VE through a remote gRPC provider architecture.
The manager reconciles Kubernetes custom resources; each hypervisor runs as a separate provider pod. Manager and provider pods communicate over gRPC. Credentials are scoped to the Provider CR and never flow through the manager.
- Multi-Hypervisor Support: vSphere, Libvirt/KVM, and Proxmox VE simultaneously
- Cross-Provider VM Migration: Storage-backend-agnostic disk migration between any two hypervisors — S3 and NFS staging backends, validated across vSphere ⇄ Libvirt/KVM ⇄ Proxmox VE in both directions (ADR-0006). The disk is staged through object storage or an NFS export and moved with
qemu-img; it never traverses a CSI PVC (PVC is compat-only). NFS uses qemu-img's native libnfs transport (kernel-mount on Proxmox). (v0.3.11) - VM Cloning (VMClone): Full and linked clones, MVP —
source.vmRef, same-provider (vSphere/Proxmox/Libvirt; libvirt: qcow2 overlay for linked, full copy for full) - VMSet CRD defined; controller not yet active: Multi-VM replica set is defined but the controller is a stub that reports
Ready=False / ControllerNotImplemented; rolling updates and replica management are roadmap - VMPlacementPolicy (reference-only): Placement rules (affinity, anti-affinity, resource constraints) expressed as a policy object referenced by
VirtualMachine.spec.placementRef; no standalone enforcement controller - Declarative v1beta1 API: Stable CRDs with OpenAPI validation
- Cloud-Init Support: Cross-provider VM initialisation via cloud-init
- Power Management: On/Off/Reboot/Graceful-Shutdown uniformly
- Async Task Tracking: Long-running vSphere and Proxmox operations tracked via TaskStatus RPC
- Resource Reconfiguration: CPU, memory, disk changes (online for vSphere/Proxmox; online for Libvirt when VM was created with
cpuHotAddEnabled/memoryHotAddEnabled, otherwise power-cycle) - G6 Circuit Breaker: One circuit breaker per Provider CR for automatic failure isolation (v0.3.6+)
- Secure-by-default gRPC: mTLS wired end-to-end (TLS 1.3, SNI, certwatcher hot-reload); provider pods fail closed without credentials (#147/#148, v0.3.7)
- Libvirt SSH host-key verification:
known_hostsenforced by default; TOFU removed (#149, v0.3.7) - Observability: 11
virtrigaud_*Prometheus metric families (1 deprecated in v0.3.6; removal in v0.4.0)
VirtRigaud uses a Remote Provider architecture for optimal scalability and reliability:
graph TB
%% Kubernetes Cluster boundary
subgraph "Kubernetes Cluster"
%% CRDs
subgraph "Custom Resources (v1beta1)"
VM[VirtualMachine]
VMC[VMClass]
VMI[VMImage]
PR[Provider]
VMNA[VMNetworkAttachment]
VMSN[VMSnapshot]
VMSET[VMSet]
VMMig[VMMigration]
VMPP[VMPlacementPolicy]
VMCL[VMClone]
end
%% Controller
CTRL["VirtRigaud Manager
(controller + G6 CB interceptor)"]
%% Remote Providers
subgraph "Remote Providers (gRPC)"
VSP[vSphere Provider Pod]
LVP[Libvirt Provider Pod]
PXP[Proxmox Provider Pod]
end
%% Connections within cluster
VM -.-> CTRL
VMC -.-> CTRL
VMI -.-> CTRL
PR -.-> CTRL
VMNA -.-> CTRL
CTRL -->|"gRPC (G4 retry + G6 CB)"| VSP
CTRL -->|"gRPC (G4 retry + G6 CB)"| LVP
CTRL -->|"gRPC (G4 retry + G6 CB)"| PXP
end
%% External Infrastructure
subgraph "External Infrastructure"
subgraph "vSphere"
VCENTER[vCenter Server]
end
subgraph "KVM"
LIBVIRT[Libvirt Host]
end
subgraph "Proxmox VE"
PVE[Proxmox Cluster]
end
end
VSP -->|govmomi API| VCENTER
LVP -->|libvirt+SSH| LIBVIRT
PXP -->|REST API| PVE
The following issues were open in v0.3.6 and resolved in v0.3.7; they are closed in this release:
- mTLS wired end-to-end (#147, v0.3.7): manager↔provider gRPC TLS is wired through the provider
Resolverwith cert/key/CA loaded, TLS 1.3, SNI, and certwatcher hot-reload. Provider servers require and verify client certificates. Exception: the libvirt provider uses plaintext gRPC to its sidecar container and separately enforces SSHknown_hosts— this is a documented maintainer choice, not a defect. - Provider gRPC auth enforced, fail-closed (#148, v0.3.7): provider pods require TLS credentials and fail closed (crash-loop) at startup if credentials are absent, unless explicitly opted into insecure mode via the provider runtime config.
- Libvirt SSH host-key verification ON by default (#149, v0.3.7): the
no_verify=1flag is removed;known_hostsis sourced from the credentials Secret. Trust-on-first-use (TOFU) is no longer the default.
Verify these controls are correctly configured before relying on them in regulated environments. For full security guidance, see the Security Operations Guide.
| CRD | Short name | Controller | Description |
|---|---|---|---|
| VirtualMachine | vm | active | A virtual machine instance |
| VMClass | vmc | active | Resource profile (CPU, memory, disk) |
| VMImage | vmi | active | Base template or image reference |
| VMNetworkAttachment | vmna | active | Network configuration |
| Provider | prov | active | Hypervisor connection + runtime config |
| VMMigration | vmmig | active | Cross-provider VM migration |
| VMSnapshot | — | active | Snapshot lifecycle management |
| VMClone | vmclone | active (MVP) | Cloning operations — MVP: source.vmRef source, same-provider, full & linked clones |
| VMSet | vmset | not yet active | Multi-VM replica set — controller is a stub that reports Ready=False / ControllerNotImplemented |
| VMPlacementPolicy | — | reference-only | Placement rules (affinity, resources) — a policy object referenced by VirtualMachine.spec.placementRef; no standalone controller |
Note: VMAdoption is a controller built into the manager, not a CRD.
Per the canonical capabilities matrix, verified against provider GetCapabilities responses (v0.3.11: the NFS migration staging backend is implemented across all three providers, both directions — alongside the S3 backend; v0.3.9 added Libvirt Clone, ImagePrepare, online disk expansion, online reconfigure, and memory snapshots):
| Feature | vSphere | Libvirt | Proxmox | Notes |
|---|---|---|---|---|
| Core Operations | ✅ | ✅ | ✅ | Create/Delete/Power/Describe |
| Reconfiguration | ✅ | ✅ | ✅ | Libvirt: online via setvcpus/setmem --live when VM was created with cpuHotAddEnabled/memoryHotAddEnabled (hotplug headroom provisioned at create, grows up to ~4× ceiling, vCPU hard cap 64); otherwise power-cycle (#203) |
| Disk Expansion | ✅ | ✅ | ✅ | Libvirt: online grow via virsh blockresize (grow-only; desired ≤ current is a no-op) + best-effort in-guest FS grow via guest agent (#201) |
| Snapshots | ✅ | ✅ | ✅ | Point-in-time captures |
| Memory Snapshots | ✅ | ✅ | ✅ | RAM-inclusive checkpoints for a running VM. vSphere: CreateSnapshot(memory=true). Libvirt: snapshot-create-as without --disk-only; a stopped VM is honestly downgraded to disk-only with a WARN (#202). |
| Cloning (full) | ✅ | ✅ | ✅ | Libvirt: full copy of resolved disk path (qemu-img convert / vol-clone), same-provider (#153) |
| Linked Clones | ✅ | ✅ | ✅ | Libvirt: qcow2 overlay (backing-file COW), same-provider (#153). UEFI/secure-boot nvram re-point is a deferred follow-up (#208). |
| Clone RPC | ✅ | ✅ | ✅ | Libvirt Clone implemented: linked (qcow2 overlay) + full copy, source.vmRef, same-provider (#153) |
| ImagePrepare RPC | ✅ | ✅ | ✅ | Libvirt: import/convert image into a storage pool (#154) |
| Task Tracking | ✅ | N/A | ✅ | Async operation monitoring |
| Console URLs | ✅ | ✅ | Proxmox console URL: planned | |
| Guest Agent | ✅ | ✅ | ✅ | IP detection and guest info |
| Image Import | ✅ | ✅ | ✅ | Libvirt: import into storage pool (#154). vSphere: OVA/content library. |
| Multi-NIC | ✅ | ✅ | ✅ | Multiple network interfaces |
| Circuit Breaker | ✅ | ✅ | ✅ | One CB per Provider CR (v0.3.6) |
| Cross-Provider Migration | ✅ | ✅ | ✅ | S3 + NFS staging backends, both directions, all pairs (ADR-0006, #236). vSphere stages pod-side; libvirt host-side; Proxmox node-side over SSH (NFS via kernel mount). PVC is compat-only. |
- Kubernetes 1.25+
- Helm 3.10+
- Go 1.26+ (for source builds only)
-
Add the Helm repository:
helm repo add virtrigaud https://projectbeskar.github.io/virtrigaud helm repo update
-
Install VirtRigaud (version 0.3.8):
helm install virtrigaud virtrigaud/virtrigaud \ --version 0.3.11 \ -n virtrigaud-system --create-namespace
CRDs are installed automatically via Helm hooks. To disable automatic CRD upgrades:
helm install virtrigaud virtrigaud/virtrigaud \ --version 0.3.11 \ -n virtrigaud-system --create-namespace \ --set crdUpgrade.enabled=false
Providers are NOT enabled via Helm flags. Create Provider CRs (step 1 below) — the controller deploys provider pods automatically.
-
Verify the installation:
kubectl get pods -n virtrigaud-system kubectl get crd | grep virtrigaud -
Upgrade:
helm upgrade virtrigaud virtrigaud/virtrigaud \ --version 0.3.11 \ -n virtrigaud-system
# Install CRDs
make install
# Run the controller locally
make runGo 1.26+ is required for source builds.
-
Create credentials secrets:
# Libvirt — SSH key (recommended) kubectl create secret generic libvirt-creds -n default \ --from-literal=username=your-ssh-username \ --from-file=ssh-privatekey=~/.ssh/id_rsa # Libvirt — password kubectl create secret generic libvirt-creds -n default \ --from-literal=username=your-ssh-username \ --from-literal=password='your-ssh-password' # vSphere kubectl create secret generic vsphere-creds -n default \ --from-literal=username=administrator@vsphere.local \ --from-literal=password='your-password' # Proxmox VE — API token (recommended; keys: token_id, token_secret) kubectl create secret generic proxmox-creds -n default \ --from-literal=token_id='virtrigaud@pve!vrtg-token' \ --from-literal=token_secret='xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx'
The Proxmox provider reads credentials from files mounted at
/etc/virtrigaud/credentials/{token_id,token_secret,username,password}. Do NOT useenvFrom: secretReffor Proxmox credentials — that pattern is not implemented. -
Create a Provider CR:
# Libvirt/KVM apiVersion: infra.virtrigaud.io/v1beta1 kind: Provider metadata: name: libvirt-kvm namespace: default spec: type: libvirt endpoint: "qemu+ssh://192.168.1.10/system" credentialSecretRef: name: libvirt-creds runtime: image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.3.11" service: port: 9443
# vSphere apiVersion: infra.virtrigaud.io/v1beta1 kind: Provider metadata: name: vsphere-datacenter namespace: default spec: type: vsphere endpoint: "https://vcenter.example.com:443" credentialSecretRef: name: vsphere-creds runtime: image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.3.11" service: port: 9443
When you apply a Provider CR, the controller creates a dedicated Deployment and Service for the provider pod in the same namespace. Each Provider CR has isolated credentials.
-
Deploy a VM:
kubectl apply -f examples/vm-ubuntu-small.yaml kubectl get virtualmachine -w
See
examples/for more examples.
VirtRigaud migrates VMs between providers by staging the disk on a storage-agnostic backend — S3-compatible object storage or an NFS export (ADR-0006). The disk never traverses a CSI PVC; the source exports its native disk format and the target converts on import.
- S3: the provider pod is the S3 client, so the bytes flow host → pod → S3 →
pod → host (the universal
relaypath). - NFS: the disk is staged on an NFS export and moved with
qemu-img's native transport — libvirt (host-side) and vSphere (pod-side) use thenfs://libnfs driver; Proxmox kernel-mounts the export (itsqemu-imgships no libnfs). NFS needsnfs.uid/gidset to the export owner; seeexamples/vmmigration-nfs.yaml.
Validated: all three providers in both directions over both backends — vSphere ⇄ Libvirt/KVM ⇄ Proxmox VE (ADR-0006 Slices 1–4). The Proxmox provider participates as a full source and target and advertises s3 and nfs (not PVC: its disks live on the node, which a pod-mounted PVC can never reach).
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMMigration
metadata:
name: vm-migration-example
namespace: default
spec:
source:
vmRef:
name: source-vm
target:
name: target-vm
providerRef:
name: target-provider
storage:
type: s3
transferMode: relay # relay (implemented); auto → relay
s3:
bucket: virtrigaud
endpoint: http://minio.example:9000 # omit for AWS S3
region: us-east-1
usePathStyle: true # true for MinIO/Ceph/rustfs; false for AWS
credentialsSecretRef:
name: s3-migration-credentials # keys: accessKeyID, secretAccessKeyA legacy
storage.type: pvcmodel (ReadWriteMany StorageClass) remains for the vSphere/libvirt directions but is compat-only — it does not work for Proxmox or for host-resident libvirt disks. Seeexamples/migration/for per-direction examples.
For full migration documentation including provider restart behaviour, see the Migration Guide.
The manager exposes Prometheus metrics at :8080/metrics (HTTP by default; flip --metrics-secure=true for HTTPS).
11 of 12 virtrigaud_* metric families are active. virtrigaud_queue_depth was deprecated in v0.3.6 (use workqueue_depth{name} instead); removal scheduled for v0.4.0.
For the full metric catalog see Observability.
# Check if CRDs were skipped
helm get values virtrigaud -n virtrigaud-system | grep skip-crds
# Manually install CRDs
kubectl apply -f charts/virtrigaud/crds/
# Or reinstall
helm uninstall virtrigaud -n virtrigaud-system
helm install virtrigaud virtrigaud/virtrigaud --version 0.3.11 \
-n virtrigaud-system --create-namespacemake build # Build the manager binary (requires Go 1.26+)
make docker-build # Build container image
make test # Run unit tests
make generate manifests # Regenerate CRDs and DeepCopy# Quick lint check (before every commit)
./hack/test-lint-locally.sh
# Comprehensive CI testing (before PRs)
./hack/test-ci-locally.sh
# Test Helm charts with Kind cluster
./hack/test-helm-locally.shSee Testing Workflows Locally for detailed instructions.
Primary documentation: https://projectbeskar.github.io/virtrigaud
In-tree design decisions: docs/adr/
Contributions are welcome. See CONTRIBUTING.md.
- William Rizzo (@wrkode) — project maintainer
- Erick Bourgeois (@ebourgeois) — project maintainer
Apache License 2.0 — see LICENSE.