K3s Ansible Deployment for Raspberry Pi CM4/CM5
Ansible playbook to deploy a k3s Kubernetes cluster on Raspberry Pi Compute Module 4 and 5 devices.
Prerequisites
- Raspberry Pi CM4/CM5 modules running Raspberry Pi OS (64-bit recommended)
- SSH access to all nodes
- Ansible installed on your control machine
- SSH key-based authentication configured
Project Structure
k3s-ansible/
├── ansible.cfg # Ansible configuration
├── site.yml # Main playbook
├── inventory/
│ └── hosts.ini # Inventory file
├── manifests/
│ └── nginx-test-deployment.yaml # Test nginx deployment
└── roles/
├── prereq/ # Prerequisites role
│ └── tasks/
│ └── main.yml
├── k3s-server/ # K3s master/server role
│ └── tasks/
│ └── main.yml
├── k3s-agent/ # K3s worker/agent role
│ └── tasks/
│ └── main.yml
└── k3s-deploy-test/ # Test deployment role
└── tasks/
└── main.yml
Configuration
1. Update Inventory
Edit inventory/hosts.ini and add your Raspberry Pi nodes:
[master]
cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true
cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false
cm4-03 ansible_host=192.168.30.103 ansible_user=pi k3s_server_init=false
[worker]
cm4-04 ansible_host=192.168.30.104 ansible_user=pi
2. Configure Variables
In inventory/hosts.ini, you can customize:
k3s_version: K3s version to install (default: v1.35.0+k3s1)extra_server_args: Additional arguments for k3s serverextra_agent_args: Additional arguments for k3s agentextra_packages: List of additional packages to install on all nodes
3. Customize Extra Packages (Optional)
The playbook can install additional system utilities on all nodes. Edit the extra_packages variable in inventory/hosts.ini:
# Comma-separated list of packages
extra_packages=btop,vim,tmux,net-tools,dnsutils,iotop,ncdu,tree,jq
Included packages:
btop- Better top, modern system monitorvim- Text editortmux- Terminal multiplexernet-tools- Network tools (ifconfig, netstat, etc.)dnsutils- DNS utilities (dig, nslookup)iotop- I/O monitorncdu- Disk usage analyzertree- Directory tree viewerjq- JSON processor
To add packages, append them to the comma-separated list. To disable extra packages entirely, comment out or remove the extra_packages line.
Usage
Test Connectivity
Basic connectivity test:
ansible all -m ping
Gather Node Information
Display critical information from all nodes (uptime, temperature, memory, disk usage, load average):
Deploy Telegraf for Metrics Collection
Stream system metrics from all nodes to InfluxDB using Telegraf client.
Prerequisites:
- InfluxDB instance running and accessible
- API token with write permissions to your bucket
Setup:
- Configure your InfluxDB credentials in
.envfile (already created):
# .env file (keep this secret, never commit!)
INFLUXDB_HOST=192.168.10.10
INFLUXDB_PORT=8086
INFLUXDB_ORG=family
INFLUXDB_BUCKET=rpi-cluster
INFLUXDB_TOKEN=your-api-token-here
- Deploy Telegraf to all nodes:
ansible-playbook telegraf.yml
Or deploy to specific nodes:
# Only worker nodes
ansible-playbook telegraf.yml --limit worker
# Only master nodes
ansible-playbook telegraf.yml --limit master
# Specific node
ansible-playbook telegraf.yml --limit cm4-02
Metrics Collected:
- System: CPU (per-core and total), memory, swap, processes, system load
- Disk: Disk I/O, disk usage, inodes
- Network: Network interfaces, packets, errors
- Thermal: CPU temperature (Raspberry Pi specific)
- K3s: Process metrics for k3s components
Verify Installation:
Check Telegraf status on a node:
ssh pi@<node-ip>
sudo systemctl status telegraf
sudo journalctl -u telegraf -f
View Metrics in InfluxDB:
Once configured, metrics will appear in your InfluxDB instance under the rpi-cluster bucket with tags for each node hostname and node type (master/worker).
Monitoring Dashboards
Two pre-built dashboards are available for visualizing your cluster metrics:
Grafana Dashboard
A comprehensive Grafana dashboard with interactive visualizations:
- CPU usage across all nodes
- Memory usage (percentage)
- CPU temperature (Raspberry Pi specific)
- System load averages
Import to Grafana:
- Open Grafana and go to Dashboards → New → Import
- Upload the dashboard file:
grafana/rpi-cluster-dashboard.json - Your InfluxDB datasource (named
influxdb) will be automatically selected - Click Import
Customize the Grafana Dashboard:
You can modify the dashboard after import to:
- Adjust time ranges (default: last 6 hours)
- Add alerts for high CPU/temperature/memory
- Add more panels for additional metrics
- Create node-specific views using Grafana variables
InfluxDB Dashboard
A native InfluxDB 2.x dashboard with built-in gauges and time series:
- CPU usage gauge (average)
- Memory usage gauge (average)
- CPU usage time series (6-hour view)
- Memory usage time series (6-hour view)
- CPU temperature trend
- System load trend
Import to InfluxDB 2.8:
Via UI (Recommended):
- Open InfluxDB UI at
http://your-influxdb-host:8086 - Go to Dashboards (left sidebar)
- Click Create Dashboard → From a Template
- Click Paste JSON
- Copy and paste the contents of
influxdb/rpi-cluster-dashboard-v2.json - Click Create Dashboard
Via CLI:
influx dashboard import \
--org family \
--file influxdb/rpi-cluster-dashboard-v2.json
Benefits of InfluxDB Dashboard:
- Native integration - no external datasource configuration needed
- Built-in alert support
- Real-time data without polling delays
- Direct access to raw data and queries
- InfluxDB 2.8 compatible
Deploy K3s Cluster
ansible-playbook site.yml
This will deploy the full k3s cluster with the test nginx application.
Deploy Without Test Application
To skip the test deployment:
ansible-playbook site.yml --skip-tags test
Deploy Only the Test Application
If the cluster is already running and you just want to deploy the test app:
ansible-playbook site.yml --tags deploy-test
Deploy Only Prerequisites
ansible-playbook site.yml --tags prereq
What the Playbook Does
Prerequisites Role (prereq)
- Sets hostname on each node
- Updates and upgrades system packages
- Installs required packages (curl, wget, git, iptables, etc.)
- Enables cgroup memory and swap in boot config
- Configures legacy iptables (required for k3s on ARM)
- Disables swap
- Reboots if necessary
K3s Server Role (k3s-server)
- Installs k3s in server mode on master node(s)
- Configures k3s with Flannel VXLAN backend (optimized for ARM)
- Retrieves and stores the node token for workers
- Copies kubeconfig to master node user
- Fetches kubeconfig to local machine for kubectl access
K3s Agent Role (k3s-agent)
- Installs k3s in agent mode on worker nodes
- Joins workers to the cluster using the master's token
- Configures agents to connect to the master
K3s Deploy Test Role (k3s-deploy-test)
- Waits for all cluster nodes to be ready
- Deploys the nginx test application with 5 replicas
- Verifies deployment is successful
- Displays pod distribution across nodes
Post-Installation
After successful deployment:
- The kubeconfig file will be saved to
./kubeconfig - Use it with kubectl:
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
You should see all your nodes in Ready state:
NAME STATUS ROLES AGE VERSION
cm4-01 Ready control-plane,etcd,master 5m v1.35.0+k3s1
cm4-02 Ready control-plane,etcd 3m v1.35.0+k3s1
cm4-03 Ready control-plane,etcd 3m v1.35.0+k3s1
cm4-04 Ready <none> 3m v1.35.0+k3s1
Accessing the Cluster
From Master Node
SSH into a master node and use kubectl:
ssh pi@192.168.30.101
kubectl get nodes
From Your Local Machine
The playbook automatically fetches the kubeconfig to ./kubeconfig. You have several options to use it:
Option 1: Temporary Access (Environment Variable)
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces
Option 2: Merge into ~/.kube/config (Recommended)
This allows you to manage multiple clusters and switch between them:
# Backup your existing config
cp ~/.kube/config ~/.kube/config.backup
# Merge the k3s config into your existing config
KUBECONFIG=~/.kube/config:$(pwd)/kubeconfig kubectl config view --flatten > ~/.kube/config.tmp
mv ~/.kube/config.tmp ~/.kube/config
# Rename the context to something meaningful
kubectl config rename-context default k3s-pi-cluster
# View all contexts
kubectl config get-contexts
# Switch to k3s context
kubectl config use-context k3s-pi-cluster
# Switch back to other clusters
kubectl config use-context <other-context-name>
Option 3: Direct Usage
Use the kubeconfig file directly without setting environment variables:
kubectl --kubeconfig=./kubeconfig get nodes
kubectl --kubeconfig=./kubeconfig get pods --all-namespaces
Ingress Setup
K3s comes with Traefik ingress controller pre-installed by default, which allows you to expose your applications via HTTP/HTTPS with domain names.
How It Works
- Traefik listens on ports 80 (HTTP) and 443 (HTTPS) on all nodes
- Ingress rules route traffic based on hostname to different services
- Multiple applications can share the same IP using different hostnames
- No additional setup required - Traefik is ready to use after cluster deployment
Verify Traefik is Running
kubectl --kubeconfig=./kubeconfig get pods -n kube-system -l app.kubernetes.io/name=traefik
kubectl --kubeconfig=./kubeconfig get svc -n kube-system traefik
View Ingress Resources
kubectl --kubeconfig=./kubeconfig get ingress
kubectl --kubeconfig=./kubeconfig describe ingress nginx-test
Testing the Cluster
A sample nginx deployment with 5 replicas and ingress is provided to test your cluster.
Automated Deployment (via Ansible)
The test application is automatically deployed with ingress when you run the full playbook:
ansible-playbook site.yml
Or deploy it separately after the cluster is up:
ansible-playbook site.yml --tags deploy-test
The Ansible role will:
- Wait for all nodes to be ready
- Deploy the nginx application with ingress
- Wait for all pods to be running
- Show deployment status, pod distribution, ingress details, and access instructions
Manual Deployment (via kubectl)
Deploy using kubectl:
export KUBECONFIG=$(pwd)/kubeconfig
kubectl apply -f manifests/nginx-test-deployment.yaml
This deploys:
- Nginx deployment with 5 replicas
- ClusterIP service
- Ingress resource for domain-based access
Verify the Deployment
Check that all 5 replicas are running:
kubectl --kubeconfig=./kubeconfig get deployments
kubectl --kubeconfig=./kubeconfig get pods -o wide
kubectl --kubeconfig=./kubeconfig get ingress
You should see output similar to:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-test 5/5 5 5 1m
NAME READY STATUS RESTARTS AGE NODE
nginx-test-7d8f4c9b6d-2xk4p 1/1 Running 0 1m pi-worker-1
nginx-test-7d8f4c9b6d-4mz9r 1/1 Running 0 1m pi-worker-2
nginx-test-7d8f4c9b6d-7w3qs 1/1 Running 0 1m pi-worker-3
nginx-test-7d8f4c9b6d-9k2ln 1/1 Running 0 1m pi-worker-1
nginx-test-7d8f4c9b6d-xr5wp 1/1 Running 0 1m pi-worker-2
Access via Ingress
Add your master node IP to /etc/hosts:
# Replace with any master or worker node IP
192.168.30.101 nginx-test.local nginx.pi.local
192.168.30.102 nginx-test.local nginx.pi.local
192.168.30.103 nginx-test.local nginx.pi.local
192.168.30.104 nginx-test.local nginx.pi.local
Then access via browser:
Or test with curl:
# Test with any cluster node IP (master or worker)
curl -H "Host: nginx-test.local" http://192.168.30.101
curl -H "Host: nginx-test.local" http://192.168.30.102
Scale the Deployment
Test scaling:
# Scale up to 10 replicas
kubectl scale deployment nginx-test --replicas=10
# Scale down to 3 replicas
kubectl scale deployment nginx-test --replicas=3
# Watch the pods being created/terminated
kubectl get pods -w
Clean Up Test Deployment
When you're done testing:
kubectl delete -f manifests/nginx-test-deployment.yaml
High Availability - Multi-Master Cluster
This deployment supports a 3-node highly available Kubernetes cluster with multiple control-plane nodes for redundancy.
Current Setup
The cluster is configured with:
- Master Nodes (Control-Plane): cm4-01, cm4-02, cm4-03
- Worker Nodes: cm4-04
- Virtual IP (VIP): 192.168.30.100 (via MikroTik router)
Why Multi-Master?
With 3 control-plane nodes:
- No Single Point of Failure: If one master fails, the cluster continues operating
- High Availability: Automatic failover between masters
- Better Uptime: Can perform maintenance on one master while others serve the cluster
- Load Distribution: API server and etcd are distributed across 3 nodes
How It Works
-
Primary Master (cm4-01):
- Initializes the cluster and creates the token
- All other nodes use its token to join
-
Additional Masters (cm4-02, cm4-03):
- Join the cluster using the token from the primary master
- Automatically become part of the control-plane
- Synchronized with the primary master
-
Worker Nodes (cm4-04):
- Join the cluster as worker nodes
- Can handle workload and are not part of control-plane
-
Virtual IP (192.168.30.100):
- MikroTik router provides a single entry point to the cluster
- Automatically routes to available control-plane nodes
- DNS points to this VIP for seamless failover
Promoting Additional Masters
To add more masters or promote a worker to master:
-
Edit
inventory/hosts.iniand move the node to[master]group:[master] cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false cm4-03 ansible_host=192.168.30.103 ansible_user=pi k3s_server_init=false # To promote cm4-04 to master: # cm4-04 ansible_host=192.168.30.104 ansible_user=pi k3s_server_init=false [worker] # Workers only -
Run the deployment playbook:
ansible-playbook site.yml --tags k3s-serverThe playbook automatically:
- Installs k3s server on the new master
- Joins it to the existing cluster
- Synchronizes with other control-plane nodes
Monitoring Master Health
Check the status of all control-plane nodes:
kubectl get nodes -o wide | grep control-plane
# or
kubectl get nodes -L node-role.kubernetes.io/control-plane
To see which nodes are control-plane:
kubectl get nodes --show-labels | grep control-plane
Monitor etcd status across masters:
# Connect to any master
ssh pi@192.168.30.101
# Check etcd status
sudo /var/lib/rancher/k3s/data/*/bin/kubectl get nodes -n kube-system
Master Failover
If a master node fails:
- The cluster detects the failure within ~30 seconds
- etcd automatically removes the failed node
- Remaining masters continue operating
- New pods are scheduled on healthy nodes
To see the status:
kubectl get nodes -o wide
To recover a failed master, simply:
# On the failed node, reset it
ssh pi@<failed-master-ip>
sudo /usr/local/bin/k3s-uninstall.sh
# Then re-run the playbook to rejoin it
ansible-playbook site.yml --tags k3s-server --limit <failed-master>
Demoting a Master to Worker
To remove a master from control-plane and make it a worker (note: this reduces HA from 3-node to 2-node):
-
Edit
inventory/hosts.ini:[master] cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false [worker] cm4-03 ansible_host=192.168.30.103 ansible_user=pi cm4-04 ansible_host=192.168.30.104 ansible_user=piWarning: This reduces your cluster to 2 master nodes. With only 2 masters, you lose quorum (require 2/3, have only 1/2 if one fails).
-
Drain the node:
kubectl drain cm4-03 --ignore-daemonsets --delete-emptydir-data -
Reset the node:
ssh pi@192.168.30.103 sudo /usr/local/bin/k3s-uninstall.sh -
Re-run the deployment:
ansible-playbook site.yml --tags k3s-agent --limit cm4-03
Maintenance
Updating the Cluster
K3s updates are handled automatically through the system package manager. There are several ways to update your cluster:
Option 1: Automatic Updates (Recommended)
K3s can automatically update itself. To enable automatic updates on all nodes:
- Add the following to your inventory
hosts.ini:
[k3s_cluster:vars]
k3s_version=latest
- Re-run the k3s installation playbook:
ansible-playbook site.yml --tags k3s-server,k3s-agent
K3s will then automatically apply updates when new versions are available (typically patched versions).
Option 2: Manual Update to Specific Version
To update to a specific k3s version:
- Update the
k3s_versionvariable ininventory/hosts.ini:
[k3s_cluster:vars]
k3s_version=v1.36.0+k3s1
- Run the k3s playbook to update all nodes:
# Update master first (required to generate token for agents)
ansible-playbook site.yml --tags k3s-server,k3s-agent
Important: Always update master nodes before workers. Workers need the token from the master to rejoin the cluster.
Option 3: Update via K3s Release Script
For more control, you can manually update k3s on individual nodes:
# SSH into a node
ssh pi@<node-ip>
# Download and install specific version
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.36.0+k3s1 sh -
# Restart k3s
sudo systemctl restart k3s # On master
sudo systemctl restart k3s-agent # On workers
Checking Current K3s Version
To see the current k3s version running on your cluster:
kubectl version --short
# or
kubectl get nodes -o wide
To check versions on specific nodes:
ssh pi@<node-ip>
k3s --version
# Or via Ansible
ansible all -m shell -a "k3s --version" --become
Update Telegraf
To update Telegraf metrics collection to the latest version:
# Update Telegraf on all nodes
ansible-playbook telegraf.yml
# Update only specific nodes
ansible-playbook telegraf.yml --limit worker
Post-Update Verification
After updating, verify your cluster is healthy:
# Check all nodes are ready
kubectl get nodes
# Check pod status
kubectl get pods --all-namespaces
# Check cluster info
kubectl cluster-info
# View recent events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
Rollback (if needed)
If an update causes issues, you can rollback to a previous version:
# Update inventory with previous version
# [k3s_cluster:vars]
# k3s_version=v1.35.0+k3s1
# Re-run the playbook
ansible-playbook site.yml --tags k3s-server,k3s-agent
Rebooting Cluster Nodes
A dedicated playbook is provided to safely reboot all cluster nodes:
ansible-playbook reboot.yml
This playbook will:
- Reboot worker nodes first (one at a time, serially)
- Wait for each worker to come back online and k3s-agent to be running
- Reboot master nodes (one at a time, serially)
- Wait for each master to come back online and k3s to be running
- Verify the cluster status and show all nodes are ready
The serial approach ensures that only one node reboots at a time, maintaining cluster availability.
Reboot Only Workers
ansible-playbook reboot.yml --limit worker
Reboot Only Masters
ansible-playbook reboot.yml --limit master
Reboot a Specific Node
ansible-playbook reboot.yml --limit cm4-04
Troubleshooting
Check k3s service status
On master:
sudo systemctl status k3s
sudo journalctl -u k3s -f
On workers:
sudo systemctl status k3s-agent
sudo journalctl -u k3s-agent -f
Reset a node
If you need to reset a node and start over:
# On the node
/usr/local/bin/k3s-uninstall.sh # For server
/usr/local/bin/k3s-agent-uninstall.sh # For agent
Common Issues
- Nodes not joining: Check firewall rules. K3s requires port 6443 open on the master.
- Memory issues: Ensure cgroup memory is enabled (the playbook handles this).
- Network issues: The playbook uses VXLAN backend which works better on ARM devices.
Customization
Add More Master Nodes (HA Setup)
For a high-availability setup, you can add more master nodes:
[master]
pi-master-1 ansible_host=192.168.30.100 ansible_user=pi
pi-master-2 ansible_host=192.168.30.101 ansible_user=pi
pi-master-3 ansible_host=192.168.30.102 ansible_user=pi
You'll need to configure an external database (etcd or PostgreSQL) for HA.
Custom K3s Arguments
Modify extra_server_args or extra_agent_args in the inventory:
[k3s_cluster:vars]
extra_server_args="--flannel-backend=vxlan --disable traefik --disable servicelb"
extra_agent_args="--node-label foo=bar"
Compute Blade Agent Deployment
The playbook includes automatic deployment of the Compute Blade Agent, a system service for managing Compute Blade hardware (Raspberry Pi CM4/CM5 modules). The agent monitors hardware states, reacts to temperature changes and button presses, and exposes metrics via Prometheus.
Components
- compute-blade-agent: Daemon that monitors hardware and manages blade operations
- bladectl: Command-line tool for local/remote interaction with the agent
- fanunit.uf2: Firmware for the fan unit microcontroller
Configuration
The compute-blade-agent deployment is controlled by the enable_compute_blade_agent variable in inventory/hosts.ini:
# Enable/disable compute-blade-agent on all nodes (control-plane and workers)
enable_compute_blade_agent=true
To disable on specific nodes, add an override:
[master]
cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true enable_compute_blade_agent=true
cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false enable_compute_blade_agent=false
cm4-03 ansible_host=192.168.30.103 ansible_user=pi k3s_server_init=false enable_compute_blade_agent=false
[worker]
cm4-04 ansible_host=192.168.30.104 ansible_user=pi enable_compute_blade_agent=true
Deployment
The compute-blade-agent is automatically deployed as part of the main playbook:
ansible-playbook site.yml
Or deploy only the compute-blade-agent on all nodes:
ansible-playbook site.yml --tags compute-blade-agent
Verification
Check the agent status on any node:
# SSH into any node
ssh pi@192.168.30.101
# Check service status
sudo systemctl status compute-blade-agent
# View logs
sudo journalctl -u compute-blade-agent -f
# Check binary installation
/usr/bin/compute-blade-agent --version
Configuration Files
The compute-blade-agent creates its configuration at:
/etc/compute-blade-agent/config.yaml
Configuration can also be controlled via environment variables prefixed with BLADE_.
Metrics and Monitoring
The compute-blade-agent exposes Prometheus metrics. To monitor the agents:
-
Optional Kubernetes resources are available in
manifests/compute-blade-agent-daemonset.yaml -
Deploy the optional monitoring resources (requires Prometheus):
kubectl apply -f manifests/compute-blade-agent-daemonset.yaml
Features
- Hardware Monitoring: Tracks temperature, fan speed, and button events
- Critical Mode: Automatically enters maximum fan speed + red LED during overheating
- Identification: Locate specific blades via LED blinking
- Metrics Export: Prometheus-compatible metrics endpoint
Troubleshooting compute-blade-agent
Service fails to start
Check the installer output:
sudo journalctl -u compute-blade-agent -n 50
Agent not detecting hardware
Verify the Compute Blade hardware is properly connected. The agent logs detailed information:
sudo journalctl -u compute-blade-agent -f
Re-run installation
To reinstall compute-blade-agent:
# SSH into the node
ssh pi@<node-ip>
# Uninstall
sudo /usr/local/bin/k3s-uninstall-compute-blade-agent.sh 2>/dev/null || echo "Not found, continuing"
# Remove from Ansible to reinstall
# Then re-run the playbook
ansible-playbook site.yml --tags compute-blade-agent
External DNS Configuration
To use external domains (like test.zlor.fi) with your k3s cluster ingress, you need to configure DNS. Your cluster uses a Virtual IP (192.168.30.100) via MikroTik for high availability.
Step 1: Configure DNS Server Records
On your DNS server, add A records pointing to your k3s cluster nodes:
Option A: Virtual IP (VIP) via MikroTik - Recommended for HA
Use your MikroTik router's Virtual IP (192.168.30.100) for high availability:
test.zlor.fi A 192.168.30.100
Pros:
- Single IP for entire cluster
- Hardware-based failover (more reliable)
- Better performance
- No additional software needed
- Automatically routes to available masters
See MIKROTIK-VIP-SETUP-CUSTOM.md for detailed setup instructions.
Option B: Multiple Records (Load Balanced)
If your DNS supports multiple A records, point to all cluster nodes:
test.zlor.fi A 192.168.30.101
test.zlor.fi A 192.168.30.102
test.zlor.fi A 192.168.30.103
test.zlor.fi A 192.168.30.104
Pros: Load balanced, automatic failover Cons: Requires DNS server support for multiple A records
Option C: Single Master Node (No Failover)
For simple setups without redundancy:
test.zlor.fi A 192.168.30.101
Pros: Simple, works with any DNS server Cons: No failover if that node is down (not recommended for HA clusters)
Step 2: Configure Cluster Nodes for External DNS
K3s nodes need to be able to resolve external DNS queries. Update the DNS resolver on all nodes:
Option A: Ansible Playbook (Recommended)
Create a new playbook dns-config.yml:
---
- name: Configure external DNS resolver
hosts: all
become: true
tasks:
- name: Update /etc/resolv.conf with custom DNS
copy:
content: |
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 192.168.1.1
dest: /etc/resolv.conf
owner: root
group: root
mode: '0644'
notify: Update systemd-resolved
- name: Make resolv.conf immutable
file:
path: /etc/resolv.conf
attributes: '+i'
state: file
- name: Configure systemd-resolved for external DNS
copy:
content: |
[Resolve]
DNS=8.8.8.8 8.8.4.4 192.168.1.1
FallbackDNS=8.8.8.8
DNSSECNegativeTrustAnchors=zlor.fi
dest: /etc/systemd/resolved.conf
owner: root
group: root
mode: '0644'
notify: Restart systemd-resolved
handlers:
- name: Update systemd-resolved
systemd:
name: systemd-resolved
state: restarted
daemon_reload: yes
Apply the playbook:
ansible-playbook dns-config.yml
Option B: Manual Configuration on Each Node
SSH into each node and update DNS:
ssh pi@192.168.30.101
sudo nano /etc/systemd/resolved.conf
Add or modify:
[Resolve]
DNS=8.8.8.8 8.8.4.4 192.168.1.1
FallbackDNS=8.8.8.8
DNSSECNegativeTrustAnchors=zlor.fi
Save and restart:
sudo systemctl restart systemd-resolved
Verify DNS is working:
nslookup test.zlor.fi
dig test.zlor.fi
Step 3: Update Ingress Configuration
Your nginx-test deployment has already been updated to include test.zlor.fi. Verify the ingress:
kubectl get ingress nginx-test -o yaml
You should see:
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
Step 4: Test External Domain Access
Once DNS is configured, test access from your local machine:
# Test DNS resolution
nslookup test.zlor.fi
# Test HTTP access
curl http://test.zlor.fi
# With verbose output
curl -v http://test.zlor.fi
# Test from all cluster IPs
for ip in 192.168.30.{101..104}; do
echo "Testing $ip:"
curl -H "Host: test.zlor.fi" http://$ip
done
Troubleshooting DNS
DNS Resolution Failing
Check if systemd-resolved is running:
systemctl status systemd-resolved
Test DNS from a node:
ssh pi@192.168.30.101
nslookup test.zlor.fi
dig test.zlor.fi @8.8.8.8
Ingress Not Responding
Check if Traefik is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik
Check ingress status:
kubectl get ingress
kubectl describe ingress nginx-test
Request Timing Out
Verify network connectivity:
# From your machine
ping 192.168.30.101
ping 192.168.30.102
# From a cluster node
ssh pi@192.168.30.101
ping test.zlor.fi
curl -v http://test.zlor.fi
Adding More Domains
To add additional domains (e.g., api.zlor.fi, admin.zlor.fi):
- Add DNS A records for each domain pointing to your cluster nodes
- Update the ingress YAML with new rules:
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
- host: api.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
- Apply the updated manifest:
kubectl apply -f manifests/nginx-test-deployment.yaml
Uninstall
To completely remove k3s from all nodes:
# Create an uninstall playbook or run manually on each node
ansible all -m shell -a "/usr/local/bin/k3s-uninstall.sh" --become
ansible workers -m shell -a "/usr/local/bin/k3s-agent-uninstall.sh" --become
To uninstall compute-blade-agent:
# Uninstall from all worker nodes
ansible worker -m shell -a "bash /usr/local/bin/k3s-uninstall-compute-blade-agent.sh" --become
License
MIT