Create MIKROTIK-VIP-SETUP.md with comprehensive guide: - MikroTik Virtual IP configuration (web interface and CLI) - NAT rule setup for traffic routing - Health check script for automatic failover - Comparison with Keepalived approach - Troubleshooting guide - Failover testing procedures Update README.md DNS configuration section: - Add MikroTik VIP as Option C1 (recommended for MikroTik users) - Keep Keepalived as Option C2 (for non-MikroTik setups) - Link to MIKROTIK-VIP-SETUP.md for detailed instructions - Clear recommendation based on hardware Benefits of MikroTik VIP over Keepalived: - Hardware-based failover (more reliable) - No additional software on cluster nodes - Simpler setup (5 minutes vs 10 minutes) - Better performance Fix markdown linting issues: - Add proper blank lines around lists - Use headings instead of emphasis - Maintain consistent formatting Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
K3s Ansible Deployment for Raspberry Pi CM4/CM5
Ansible playbook to deploy a k3s Kubernetes cluster on Raspberry Pi Compute Module 4 and 5 devices.
Prerequisites
- Raspberry Pi CM4/CM5 modules running Raspberry Pi OS (64-bit recommended)
- SSH access to all nodes
- Ansible installed on your control machine
- SSH key-based authentication configured
Project Structure
k3s-ansible/
├── ansible.cfg # Ansible configuration
├── site.yml # Main playbook
├── inventory/
│ └── hosts.ini # Inventory file
├── manifests/
│ └── nginx-test-deployment.yaml # Test nginx deployment
└── roles/
├── prereq/ # Prerequisites role
│ └── tasks/
│ └── main.yml
├── k3s-server/ # K3s master/server role
│ └── tasks/
│ └── main.yml
├── k3s-agent/ # K3s worker/agent role
│ └── tasks/
│ └── main.yml
└── k3s-deploy-test/ # Test deployment role
└── tasks/
└── main.yml
Configuration
1. Update Inventory
Edit inventory/hosts.ini and add your Raspberry Pi nodes:
[master]
pi-master ansible_host=192.168.30.100 ansible_user=pi
[worker]
pi-worker-1 ansible_host=192.168.30.102 ansible_user=pi
pi-worker-2 ansible_host=192.168.30.103 ansible_user=pi
pi-worker-3 ansible_host=192.168.30.104 ansible_user=pi
2. Configure Variables
In inventory/hosts.ini, you can customize:
k3s_version: K3s version to install (default: v1.34.2+k3s1)extra_server_args: Additional arguments for k3s serverextra_agent_args: Additional arguments for k3s agentextra_packages: List of additional packages to install on all nodes
3. Customize Extra Packages (Optional)
The playbook can install additional system utilities on all nodes. Edit the extra_packages variable in inventory/hosts.ini:
# Comma-separated list of packages
extra_packages=btop,vim,tmux,net-tools,dnsutils,iotop,ncdu,tree,jq
Included packages:
btop- Better top, modern system monitorvim- Text editortmux- Terminal multiplexernet-tools- Network tools (ifconfig, netstat, etc.)dnsutils- DNS utilities (dig, nslookup)iotop- I/O monitorncdu- Disk usage analyzertree- Directory tree viewerjq- JSON processor
To add packages, append them to the comma-separated list. To disable extra packages entirely, comment out or remove the extra_packages line.
Usage
Test Connectivity
Basic connectivity test:
ansible all -m ping
Gather Node Information
Display critical information from all nodes (uptime, temperature, memory, disk usage, load average):
Deploy Telegraf for Metrics Collection
Stream system metrics from all nodes to InfluxDB using Telegraf client.
Prerequisites:
- InfluxDB instance running and accessible
- API token with write permissions to your bucket
Setup:
- Configure your InfluxDB credentials in
.envfile (already created):
# .env file (keep this secret, never commit!)
INFLUXDB_HOST=192.168.10.10
INFLUXDB_PORT=8086
INFLUXDB_ORG=family
INFLUXDB_BUCKET=rpi-cluster
INFLUXDB_TOKEN=your-api-token-here
- Deploy Telegraf to all nodes:
ansible-playbook telegraf.yml
Or deploy to specific nodes:
# Only worker nodes
ansible-playbook telegraf.yml --limit worker
# Only master nodes
ansible-playbook telegraf.yml --limit master
# Specific node
ansible-playbook telegraf.yml --limit cm4-02
Metrics Collected:
- System: CPU (per-core and total), memory, swap, processes, system load
- Disk: Disk I/O, disk usage, inodes
- Network: Network interfaces, packets, errors
- Thermal: CPU temperature (Raspberry Pi specific)
- K3s: Process metrics for k3s components
Verify Installation:
Check Telegraf status on a node:
ssh pi@<node-ip>
sudo systemctl status telegraf
sudo journalctl -u telegraf -f
View Metrics in InfluxDB:
Once configured, metrics will appear in your InfluxDB instance under the rpi-cluster bucket with tags for each node hostname and node type (master/worker).
Monitoring Dashboards
Two pre-built dashboards are available for visualizing your cluster metrics:
Grafana Dashboard
A comprehensive Grafana dashboard with interactive visualizations:
- CPU usage across all nodes
- Memory usage (percentage)
- CPU temperature (Raspberry Pi specific)
- System load averages
Import to Grafana:
- Open Grafana and go to Dashboards → New → Import
- Upload the dashboard file:
grafana/rpi-cluster-dashboard.json - Your InfluxDB datasource (named
influxdb) will be automatically selected - Click Import
Customize the Grafana Dashboard:
You can modify the dashboard after import to:
- Adjust time ranges (default: last 6 hours)
- Add alerts for high CPU/temperature/memory
- Add more panels for additional metrics
- Create node-specific views using Grafana variables
InfluxDB Dashboard
A native InfluxDB 2.x dashboard with built-in gauges and time series:
- CPU usage gauge (average)
- Memory usage gauge (average)
- CPU usage time series (6-hour view)
- Memory usage time series (6-hour view)
- CPU temperature trend
- System load trend
Import to InfluxDB 2.8:
Via UI (Recommended):
- Open InfluxDB UI at
http://your-influxdb-host:8086 - Go to Dashboards (left sidebar)
- Click Create Dashboard → From a Template
- Click Paste JSON
- Copy and paste the contents of
influxdb/rpi-cluster-dashboard-v2.json - Click Create Dashboard
Via CLI:
influx dashboard import \
--org family \
--file influxdb/rpi-cluster-dashboard-v2.json
Benefits of InfluxDB Dashboard:
- Native integration - no external datasource configuration needed
- Built-in alert support
- Real-time data without polling delays
- Direct access to raw data and queries
- InfluxDB 2.8 compatible
Deploy K3s Cluster
ansible-playbook site.yml
This will deploy the full k3s cluster with the test nginx application.
Deploy Without Test Application
To skip the test deployment:
ansible-playbook site.yml --skip-tags test
Deploy Only the Test Application
If the cluster is already running and you just want to deploy the test app:
ansible-playbook site.yml --tags deploy-test
Deploy Only Prerequisites
ansible-playbook site.yml --tags prereq
What the Playbook Does
Prerequisites Role (prereq)
- Sets hostname on each node
- Updates and upgrades system packages
- Installs required packages (curl, wget, git, iptables, etc.)
- Enables cgroup memory and swap in boot config
- Configures legacy iptables (required for k3s on ARM)
- Disables swap
- Reboots if necessary
K3s Server Role (k3s-server)
- Installs k3s in server mode on master node(s)
- Configures k3s with Flannel VXLAN backend (optimized for ARM)
- Retrieves and stores the node token for workers
- Copies kubeconfig to master node user
- Fetches kubeconfig to local machine for kubectl access
K3s Agent Role (k3s-agent)
- Installs k3s in agent mode on worker nodes
- Joins workers to the cluster using the master's token
- Configures agents to connect to the master
K3s Deploy Test Role (k3s-deploy-test)
- Waits for all cluster nodes to be ready
- Deploys the nginx test application with 5 replicas
- Verifies deployment is successful
- Displays pod distribution across nodes
Post-Installation
After successful deployment:
- The kubeconfig file will be saved to
./kubeconfig - Use it with kubectl:
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
You should see all your nodes in Ready state:
NAME STATUS ROLES AGE VERSION
pi-master Ready control-plane,master 5m v1.34.2+k3s1
pi-worker-1 Ready <none> 3m v1.34.2+k3s1
pi-worker-2 Ready <none> 3m v1.34.2+k3s1
Accessing the Cluster
From Master Node
SSH into the master node and use kubectl:
ssh pi@pi-master
kubectl get nodes
From Your Local Machine
The playbook automatically fetches the kubeconfig to ./kubeconfig. You have several options to use it:
Option 1: Temporary Access (Environment Variable)
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces
Option 2: Merge into ~/.kube/config (Recommended)
This allows you to manage multiple clusters and switch between them:
# Backup your existing config
cp ~/.kube/config ~/.kube/config.backup
# Merge the k3s config into your existing config
KUBECONFIG=~/.kube/config:$(pwd)/kubeconfig kubectl config view --flatten > ~/.kube/config.tmp
mv ~/.kube/config.tmp ~/.kube/config
# Rename the context to something meaningful
kubectl config rename-context default k3s-pi-cluster
# View all contexts
kubectl config get-contexts
# Switch to k3s context
kubectl config use-context k3s-pi-cluster
# Switch back to other clusters
kubectl config use-context <other-context-name>
Option 3: Direct Usage
Use the kubeconfig file directly without setting environment variables:
kubectl --kubeconfig=./kubeconfig get nodes
kubectl --kubeconfig=./kubeconfig get pods --all-namespaces
Ingress Setup
K3s comes with Traefik ingress controller pre-installed by default, which allows you to expose your applications via HTTP/HTTPS with domain names.
How It Works
- Traefik listens on ports 80 (HTTP) and 443 (HTTPS) on all nodes
- Ingress rules route traffic based on hostname to different services
- Multiple applications can share the same IP using different hostnames
- No additional setup required - Traefik is ready to use after cluster deployment
Verify Traefik is Running
kubectl --kubeconfig=./kubeconfig get pods -n kube-system -l app.kubernetes.io/name=traefik
kubectl --kubeconfig=./kubeconfig get svc -n kube-system traefik
View Ingress Resources
kubectl --kubeconfig=./kubeconfig get ingress
kubectl --kubeconfig=./kubeconfig describe ingress nginx-test
Testing the Cluster
A sample nginx deployment with 5 replicas and ingress is provided to test your cluster.
Automated Deployment (via Ansible)
The test application is automatically deployed with ingress when you run the full playbook:
ansible-playbook site.yml
Or deploy it separately after the cluster is up:
ansible-playbook site.yml --tags deploy-test
The Ansible role will:
- Wait for all nodes to be ready
- Deploy the nginx application with ingress
- Wait for all pods to be running
- Show deployment status, pod distribution, ingress details, and access instructions
Manual Deployment (via kubectl)
Deploy using kubectl:
export KUBECONFIG=$(pwd)/kubeconfig
kubectl apply -f manifests/nginx-test-deployment.yaml
This deploys:
- Nginx deployment with 5 replicas
- ClusterIP service
- Ingress resource for domain-based access
Verify the Deployment
Check that all 5 replicas are running:
kubectl --kubeconfig=./kubeconfig get deployments
kubectl --kubeconfig=./kubeconfig get pods -o wide
kubectl --kubeconfig=./kubeconfig get ingress
You should see output similar to:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-test 5/5 5 5 1m
NAME READY STATUS RESTARTS AGE NODE
nginx-test-7d8f4c9b6d-2xk4p 1/1 Running 0 1m pi-worker-1
nginx-test-7d8f4c9b6d-4mz9r 1/1 Running 0 1m pi-worker-2
nginx-test-7d8f4c9b6d-7w3qs 1/1 Running 0 1m pi-worker-3
nginx-test-7d8f4c9b6d-9k2ln 1/1 Running 0 1m pi-worker-1
nginx-test-7d8f4c9b6d-xr5wp 1/1 Running 0 1m pi-worker-2
Access via Ingress
Add your master node IP to /etc/hosts:
# Replace 192.168.30.101 with your master node IP
192.168.30.101 nginx-test.local nginx.pi.local
Then access via browser:
Or test with curl:
# Replace with your master node IP
curl -H "Host: nginx-test.local" http://192.168.30.101
Scale the Deployment
Test scaling:
# Scale up to 10 replicas
kubectl scale deployment nginx-test --replicas=10
# Scale down to 3 replicas
kubectl scale deployment nginx-test --replicas=3
# Watch the pods being created/terminated
kubectl get pods -w
Clean Up Test Deployment
When you're done testing:
kubectl delete -f manifests/nginx-test-deployment.yaml
Maintenance
Updating the Cluster
K3s updates are handled automatically through the system package manager. There are several ways to update your cluster:
Option 1: Automatic Updates (Recommended)
K3s can automatically update itself. To enable automatic updates on all nodes:
- Add the following to your inventory
hosts.ini:
[k3s_cluster:vars]
k3s_version=latest
- Re-run the k3s installation playbook:
ansible-playbook site.yml --tags k3s-server,k3s-agent
K3s will then automatically apply updates when new versions are available (typically patched versions).
Option 2: Manual Update to Specific Version
To update to a specific k3s version:
- Update the
k3s_versionvariable ininventory/hosts.ini:
[k3s_cluster:vars]
k3s_version=v1.35.0+k3s1
- Run the k3s playbook to update all nodes:
# Update master first (required to generate token for agents)
ansible-playbook site.yml --tags k3s-server,k3s-agent
Important: Always update master nodes before workers. Workers need the token from the master to rejoin the cluster.
Option 3: Update via K3s Release Script
For more control, you can manually update k3s on individual nodes:
# SSH into a node
ssh pi@<node-ip>
# Download and install specific version
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.0+k3s1 sh -
# Restart k3s
sudo systemctl restart k3s # On master
sudo systemctl restart k3s-agent # On workers
Checking Current K3s Version
To see the current k3s version running on your cluster:
kubectl version --short
# or
kubectl get nodes -o wide
To check versions on specific nodes:
ssh pi@<node-ip>
k3s --version
# Or via Ansible
ansible all -m shell -a "k3s --version" --become
Update Telegraf
To update Telegraf metrics collection to the latest version:
# Update Telegraf on all nodes
ansible-playbook telegraf.yml
# Update only specific nodes
ansible-playbook telegraf.yml --limit worker
Post-Update Verification
After updating, verify your cluster is healthy:
# Check all nodes are ready
kubectl get nodes
# Check pod status
kubectl get pods --all-namespaces
# Check cluster info
kubectl cluster-info
# View recent events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
Rollback (if needed)
If an update causes issues, you can rollback to a previous version:
# Update inventory with previous version
# [k3s_cluster:vars]
# k3s_version=v1.34.2+k3s1
# Re-run the playbook
ansible-playbook site.yml --tags k3s-server,k3s-agent
Rebooting Cluster Nodes
A dedicated playbook is provided to safely reboot all cluster nodes:
ansible-playbook reboot.yml
This playbook will:
- Reboot worker nodes first (one at a time, serially)
- Wait for each worker to come back online and k3s-agent to be running
- Reboot master nodes (one at a time, serially)
- Wait for each master to come back online and k3s to be running
- Verify the cluster status and show all nodes are ready
The serial approach ensures that only one node reboots at a time, maintaining cluster availability.
Reboot Only Workers
ansible-playbook reboot.yml --limit worker
Reboot Only Masters
ansible-playbook reboot.yml --limit master
Reboot a Specific Node
ansible-playbook reboot.yml --limit pi-worker-1
Troubleshooting
Check k3s service status
On master:
sudo systemctl status k3s
sudo journalctl -u k3s -f
On workers:
sudo systemctl status k3s-agent
sudo journalctl -u k3s-agent -f
Reset a node
If you need to reset a node and start over:
# On the node
/usr/local/bin/k3s-uninstall.sh # For server
/usr/local/bin/k3s-agent-uninstall.sh # For agent
Common Issues
- Nodes not joining: Check firewall rules. K3s requires port 6443 open on the master.
- Memory issues: Ensure cgroup memory is enabled (the playbook handles this).
- Network issues: The playbook uses VXLAN backend which works better on ARM devices.
Customization
Add More Master Nodes (HA Setup)
For a high-availability setup, you can add more master nodes:
[master]
pi-master-1 ansible_host=192.168.30.100 ansible_user=pi
pi-master-2 ansible_host=192.168.30.101 ansible_user=pi
pi-master-3 ansible_host=192.168.30.102 ansible_user=pi
You'll need to configure an external database (etcd or PostgreSQL) for HA.
Custom K3s Arguments
Modify extra_server_args or extra_agent_args in the inventory:
[k3s_cluster:vars]
extra_server_args="--flannel-backend=vxlan --disable traefik --disable servicelb"
extra_agent_args="--node-label foo=bar"
Compute Blade Agent Deployment
The playbook includes automatic deployment of the Compute Blade Agent, a system service for managing Compute Blade hardware (Raspberry Pi CM4/CM5 modules). The agent monitors hardware states, reacts to temperature changes and button presses, and exposes metrics via Prometheus.
Components
- compute-blade-agent: Daemon that monitors hardware and manages blade operations
- bladectl: Command-line tool for local/remote interaction with the agent
- fanunit.uf2: Firmware for the fan unit microcontroller
Configuration
The compute-blade-agent deployment is controlled by the enable_compute_blade_agent variable in inventory/hosts.ini:
# Enable/disable compute-blade-agent on all worker nodes
enable_compute_blade_agent=true
To disable on specific nodes, add an override:
[worker]
cm4-02 ansible_host=192.168.30.102 ansible_user=pi enable_compute_blade_agent=false
cm4-03 ansible_host=192.168.30.103 ansible_user=pi
cm4-04 ansible_host=192.168.30.104 ansible_user=pi
Deployment
The compute-blade-agent is automatically deployed as part of the main playbook:
ansible-playbook site.yml
Or deploy only the compute-blade-agent on worker nodes:
ansible-playbook site.yml --tags compute-blade-agent
Verification
Check the agent status on a worker node:
# SSH into a worker node
ssh pi@192.168.30.102
# Check service status
sudo systemctl status compute-blade-agent
# View logs
sudo journalctl -u compute-blade-agent -f
# Check binary installation
/usr/local/bin/compute-blade-agent --version
Configuration Files
The compute-blade-agent creates its configuration at:
/etc/compute-blade-agent/config.yaml
Configuration can also be controlled via environment variables prefixed with BLADE_.
Metrics and Monitoring
The compute-blade-agent exposes Prometheus metrics. To monitor the agents:
-
Optional Kubernetes resources are available in
manifests/compute-blade-agent-daemonset.yaml -
Deploy the optional monitoring resources (requires Prometheus):
kubectl apply -f manifests/compute-blade-agent-daemonset.yaml
Features
- Hardware Monitoring: Tracks temperature, fan speed, and button events
- Critical Mode: Automatically enters maximum fan speed + red LED during overheating
- Identification: Locate specific blades via LED blinking
- Metrics Export: Prometheus-compatible metrics endpoint
Troubleshooting compute-blade-agent
Service fails to start
Check the installer output:
sudo journalctl -u compute-blade-agent -n 50
Agent not detecting hardware
Verify the Compute Blade hardware is properly connected. The agent logs detailed information:
sudo journalctl -u compute-blade-agent -f
Re-run installation
To reinstall compute-blade-agent:
# SSH into the node
ssh pi@<node-ip>
# Uninstall
sudo /usr/local/bin/k3s-uninstall-compute-blade-agent.sh 2>/dev/null || echo "Not found, continuing"
# Remove from Ansible to reinstall
# Then re-run the playbook
ansible-playbook site.yml --tags compute-blade-agent
External DNS Configuration
To use external domains (like test.zlor.fi) with your k3s cluster ingress, you need to configure DNS and update your nodes.
Step 1: Configure DNS Server Records
On your DNS server, add A records pointing to your k3s cluster nodes:
Option A: Single Record (Master Node Only) - Simplest
If your DNS only allows one A record:
test.zlor.fi A 192.168.30.101
Pros: Simple, works with any DNS server Cons: No failover if master node is down
Option B: Multiple Records (Load Balanced) - Best Redundancy
If your DNS supports multiple A records:
test.zlor.fi A 192.168.30.101
test.zlor.fi A 192.168.30.102
test.zlor.fi A 192.168.30.103
test.zlor.fi A 192.168.30.104
DNS clients will distribute requests across all nodes (round-robin).
Pros: Load balanced, automatic failover Cons: Requires DNS server support for multiple A records
Option C: Virtual IP (VIP) - Best of Both Worlds
If your DNS only allows one A record but you want redundancy:
test.zlor.fi A 192.168.30.100
Set up a virtual IP that automatically handles failover. You have two sub-options:
Option C1: MikroTik VIP (Recommended if you have MikroTik router)
Configure VIP directly on your MikroTik router. See MIKROTIK-VIP-SETUP.md for detailed instructions.
Pros:
- Simple setup (5 minutes)
- No additional software on cluster nodes
- Hardware-based failover (more reliable)
- Better performance
Option C2: Keepalived (Software-based VIP)
Configure floating IP using Keepalived on cluster nodes. See "Virtual IP Setup (Keepalived)" below for detailed instructions.
Pros:
- No router configuration needed
- Portable across different networks
- Works in cloud environments
Cons:
- Additional daemon on all nodes
- More configuration needed
Recommendation: If you have MikroTik, use Option C1 (MikroTik VIP). Otherwise, use Option C2 (Keepalived).
Step 2: Configure Cluster Nodes for External DNS
K3s nodes need to be able to resolve external DNS queries. Update the DNS resolver on all nodes:
Option A: Ansible Playbook (Recommended)
Create a new playbook dns-config.yml:
---
- name: Configure external DNS resolver
hosts: all
become: yes
tasks:
- name: Update /etc/resolv.conf with custom DNS
copy:
content: |
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 192.168.1.1
dest: /etc/resolv.conf
owner: root
group: root
mode: '0644'
notify: Update systemd-resolved
- name: Make resolv.conf immutable
file:
path: /etc/resolv.conf
attributes: '+i'
state: file
- name: Configure systemd-resolved for external DNS
copy:
content: |
[Resolve]
DNS=8.8.8.8 8.8.4.4 192.168.1.1
FallbackDNS=8.8.8.8
DNSSECNegativeTrustAnchors=zlor.fi
dest: /etc/systemd/resolved.conf
owner: root
group: root
mode: '0644'
notify: Restart systemd-resolved
handlers:
- name: Update systemd-resolved
systemd:
name: systemd-resolved
state: restarted
daemon_reload: yes
Apply the playbook:
ansible-playbook dns-config.yml
Option B: Manual Configuration on Each Node
SSH into each node and update DNS:
ssh pi@192.168.30.101
sudo nano /etc/systemd/resolved.conf
Add or modify:
[Resolve]
DNS=8.8.8.8 8.8.4.4 192.168.1.1
FallbackDNS=8.8.8.8
DNSSECNegativeTrustAnchors=zlor.fi
Save and restart:
sudo systemctl restart systemd-resolved
Verify DNS is working:
nslookup test.zlor.fi
dig test.zlor.fi
Step 3: Update Ingress Configuration
Your nginx-test deployment has already been updated to include test.zlor.fi. Verify the ingress:
kubectl get ingress nginx-test -o yaml
You should see:
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
Step 4: Test External Domain Access
Once DNS is configured, test access from your local machine:
# Test DNS resolution
nslookup test.zlor.fi
# Test HTTP access
curl http://test.zlor.fi
# With verbose output
curl -v http://test.zlor.fi
# Test from all cluster IPs
for ip in 192.168.30.{101..104}; do
echo "Testing $ip:"
curl -H "Host: test.zlor.fi" http://$ip
done
Troubleshooting DNS
DNS Resolution Failing
Check if systemd-resolved is running:
systemctl status systemd-resolved
Test DNS from a node:
ssh pi@192.168.30.101
nslookup test.zlor.fi
dig test.zlor.fi @8.8.8.8
Ingress Not Responding
Check if Traefik is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik
Check ingress status:
kubectl get ingress
kubectl describe ingress nginx-test
Request Timing Out
Verify network connectivity:
# From your machine
ping 192.168.30.101
ping 192.168.30.102
# From a cluster node
ssh pi@192.168.30.101
ping test.zlor.fi
curl -v http://test.zlor.fi
Adding More Domains
To add additional domains (e.g., api.zlor.fi, admin.zlor.fi):
- Add DNS A records for each domain pointing to your cluster nodes
- Update the ingress YAML with new rules:
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
- host: api.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
- Apply the updated manifest:
kubectl apply -f manifests/nginx-test-deployment.yaml
Virtual IP Setup - Keepalived (Option C2)
If your DNS server only allows a single A record but you want high availability across all nodes, and you're not using MikroTik VIP, use a Virtual IP (VIP) with Keepalived.
How It Works
- A virtual IP (192.168.30.100) floats between cluster nodes using VRRP protocol
- The master node holds the VIP by default
- If the master fails, a worker node automatically takes over
- All traffic reaches the cluster through a single IP address
- Clients experience automatic failover with minimal downtime
Prerequisites
- All nodes must be on the same network segment
- Network must support ARP protocol (standard on most networks)
- No other services should use 192.168.30.100
Installation
Step 1: Update Your VIP Address
Edit vip-setup.yml and change the VIP to an unused IP on your network:
vars:
vip_address: "192.168.30.100" # Change this to your desired VIP
vip_interface: "eth0" # Change if your interface is different
Step 2: Run the VIP Setup Playbook
ansible-playbook vip-setup.yml
This will:
- Install Keepalived on all nodes
- Configure VRRP with master on cm4-01 and backup on workers
- Set up health checks for automatic failover
- Enable the virtual IP
Step 3: Verify VIP is Active
Check that the VIP is assigned to the master node:
# From your local machine
ping 192.168.30.100
# From any cluster node
ssh pi@192.168.30.101
ip addr show
# Look for your VIP address in the output
Step 4: Update DNS Records
Now you can use just one A record pointing to the VIP:
test.zlor.fi A 192.168.30.100
Step 5: Update Ingress (Optional)
If you want to reference the VIP in your ingress, update the manifest:
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
The ingress is already correct - it will reach the cluster through any node IP.
Monitoring the VIP
Check VIP status and failover behavior:
# View Keepalived status
ssh pi@192.168.30.101
systemctl status keepalived
# Watch VIP transitions (open in separate terminal)
watch 'ip addr show | grep 192.168.30.100'
# View Keepalived logs
sudo journalctl -u keepalived -f
# Check health check script
sudo cat /usr/local/bin/check_apiserver.sh
Testing Failover
To test automatic failover:
- Note which node has the VIP:
for ip in 192.168.30.{101..104}; do
echo "=== $ip ==="
ssh pi@$ip "ip addr show | grep 192.168.30.100" 2>/dev/null || echo "Not on this node"
done
- SSH into the node holding the VIP and stop keepalived:
ssh pi@192.168.30.101 # or whichever node has the VIP
sudo systemctl stop keepalived
- Watch the VIP migrate to another node:
# From another terminal, watch the migration
ping 192.168.30.100 -c 5
# Connection may drop briefly, then resume on new node
- Restart keepalived on the original node:
sudo systemctl start keepalived
Troubleshooting VIP
VIP is not appearing on any node
Check if Keepalived is running:
ssh pi@192.168.30.101
sudo systemctl status keepalived
sudo journalctl -u keepalived -n 20
Verify the interface name:
ip route | grep default # Should show your interface name
Update vip_interface in vip-setup.yml if needed and re-run.
VIP keeps switching between nodes
This indicates the health check is failing. Verify:
# Check if API server is responding
curl -k https://127.0.0.1:6443/healthz
# Check the health check script
cat /usr/local/bin/check_apiserver.sh
sudo bash /usr/local/bin/check_apiserver.sh
DNS resolves but connections time out
Verify all nodes have the VIP configured:
for ip in 192.168.30.{101..104}; do
echo "=== $ip ==="
ssh pi@$ip "ip addr show | grep 192.168.30.100"
done
Test direct connectivity to the VIP from each node:
ssh pi@192.168.30.101
curl -H "Host: test.zlor.fi" http://192.168.30.100
Disabling VIP
If you no longer need the VIP:
# Stop Keepalived on all nodes
ansible all -m systemd -a "name=keepalived state=stopped enabled=no" --become
# Remove configuration
ansible all -m file -a "path=/etc/keepalived/keepalived.conf state=absent" --become
Uninstall
To completely remove k3s from all nodes:
# Create an uninstall playbook or run manually on each node
ansible all -m shell -a "/usr/local/bin/k3s-uninstall.sh" --become
ansible workers -m shell -a "/usr/local/bin/k3s-agent-uninstall.sh" --become
To uninstall compute-blade-agent:
# Uninstall from all worker nodes
ansible worker -m shell -a "bash /usr/local/bin/k3s-uninstall-compute-blade-agent.sh" --become
License
MIT