- Introduce Traefik ACME configuration using Cloudflare DNS-01 challenge
- Deploy Vaultwarden password manager with IP allowlist protection
- Add middleware for security headers, compression, and rate limiting
- Update IngressRoute resources to use new ACME resolver
- Add troubleshooting steps for certificate and TLS issues
- Include test application deployment and verification commands
- Obtain the token from the primary master via slurp and base64 decode
- Derive k3s_url from the primary master's ansible_host
- Use the decoded content as k3s_token
- Update the success message quoting
- Switch compute-blade-agent deployment from workers to all nodes
(control-plane and workers)
- Use /usr/bin/compute-blade-agent instead of /usr/local/bin
- Update verification scripts to reference /usr/bin/compute-blade-agent
- Update docs to refer to all nodes across Deployment Guide, Checklist,
and Getting Started
- Change site.yml to install on all hosts instead of just workers
- Align example commands to the all-nodes workflow
- Refactor MikroTik API parsing to robustly extract name and poe-out and
add debug logs
- Update boot config on CM4: ensure [cm4] exists, enable_uart=1, and
apply uart5 overlay
- Introduces a Rust-based MikroTik PoE control CLI and library
- Includes a MikroTik RouterOS API client with basic commands
- Exposes on/off/status via scriptRun and PoE status query
- Provides .env.example, README, and a minimal project layout for the
mikrotik_cli crate
This change transforms the cluster from single-master to a fully redundant
3-node control-plane setup for high availability.
Changes:
- Updated inventory/hosts.ini to promote cm4-02 and cm4-03 to master group
* Added k3s_server_init flag to distinguish primary (true) from joining (false) masters
* Reduced worker nodes from 3 to 1 (cm4-04)
* Added clear comments explaining the HA setup
- Modified roles/k3s-server/tasks/main.yml for multi-master support
* Separated primary master initialization from additional master joining
* Primary master (k3s_server_init=true) initializes cluster and generates token
* Additional masters (k3s_server_init=false) join using primary's token
* Proper sequencing ensures cluster stability during joining
* Common tasks (kubeconfig setup) run on all masters
- Updated site.yml for proper master deployment sequencing
* Primary master deploys first and initializes cluster
* Additional masters deploy serially (one at a time) for stability
* Serial deployment prevents etcd consensus issues during joining
* Workers join only after all masters are ready
* Test deployments run on primary master only
- Added comprehensive "High Availability - Multi-Master Cluster" section to README
* Explains benefits of multi-master setup
* Documents how to promote nodes to master
* Includes monitoring and failover procedures
* Shows how to recover from failed masters
* Explains demoting masters back to workers
Benefits:
✓ No single point of failure in control-plane
✓ Automatic etcd clustering across 3 nodes
✓ Can maintain master updates with 0 downtime
✓ Faster cluster recovery from node failures
✓ Better performance distribution for API server
✓ Works seamlessly with MikroTik VIP for external access
Deployment Flow:
1. cm4-01 initializes cluster (becomes primary master)
2. cm4-02 joins as control-plane node
3. cm4-03 joins as control-plane node
4. cm4-04 joins as worker node
5. All nodes join etcd cluster with proper quorum
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Deleted vip-setup.yml playbook (Keepalived no longer needed)
- Updated MIKROTIK-VIP-SETUP-CUSTOM.md with corrected MikroTik syntax:
* Fixed path notation: use spaces not slashes (/ip firewall nat not /ip/firewall/nat/)
* Fixed action parameter: use dst-nat not dstnat
* Added web interface alternative for NAT rule configuration
* Added important syntax notes section
- Removed Keepalived documentation from README.md
- Kept MIKROTIK-VIP-SETUP.md as general reference guide
- Updated DNS and external access section to reference MikroTik VIP only
This simplifies the project by removing software-based VIP complexity since
the hardware-based MikroTik VIP provides better performance with no node overhead.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fix HTTP NAT rule to be single line without breaks
- Fix HTTPS NAT rule to be single line without breaks
- Ensure all commands are copy-paste ready without syntax errors
- User reported: 'syntax error (line 1 column 94)' when running split command
All NAT rules now properly formatted as single-line commands suitable
for copy-paste into MikroTik SSH terminal.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Create vip-setup.yml Ansible playbook for Keepalived-based VIP configuration
- Automatic failover between cluster nodes using VRRP protocol
- Health checks for API server availability
- Single IP address can be used in DNS instead of multiple A records
- Master node holds VIP by default, workers act as backups
Update README.md with comprehensive VIP documentation:
- Add three DNS options (single record, multiple records, VIP)
- Detailed VIP installation and verification steps
- Monitoring and failover testing procedures
- Troubleshooting guide for common VIP issues
- Instructions for disabling VIP if needed
Benefits:
- Single DNS A record pointing to VIP (192.168.30.100)
- Automatic failover with no manual intervention
- Load balancing capability across all nodes
- Transparent to applications
Fix markdown linting issues:
- Add proper blank lines around lists and code blocks
- Use consistent ordered list numbering (all 1.)
- Remove duplicate/extra blank lines
- Ensure proper spacing around headings
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Update nginx-test ingress to include test.zlor.fi domain
- Add comprehensive DNS configuration section to README with:
- DNS A record setup (single and multi-record options)
- Ansible playbook for automated DNS resolver configuration
- Manual DNS configuration instructions
- Ingress verification steps
- Testing procedures and troubleshooting guide
- Instructions for adding additional domains
- Fix markdown linting issues (blank lines, language identifiers, list prefixes)
DNS configuration now supports:
- External domain resolution (test.zlor.fi)
- systemd-resolved integration
- Load balancing across cluster nodes
- Multiple domain support
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove 'when: not k3s_binary.stat.exists' condition from k3s-server and
k3s-agent installation tasks to allow in-place upgrades of K3s versions
- Update task names to reflect both install and upgrade functionality
- Add change detection using stdout inspection for better Ansible reporting
Add InfluxDB v2 native dashboard alongside Grafana dashboard:
- Create influxdb/rpi-cluster-dashboard-v2.json for InfluxDB 2.8 compatibility
- Update Grafana dashboard datasource UID from 'influx' to 'influxdb'
- Remove unused disk usage and network traffic panels per user request
Update worker node discovery in compute-blade-agent verification script:
- Fix pattern matching to work with cm4-* node naming convention
- Add support for pi-worker and cb-0* patterns as fallbacks
- Now correctly parses [worker] section from inventory
Update inventory version documentation:
- Add comment explaining how to use 'latest' for auto-updates
- Set version to v1.35.0+k3s1 (updated from v1.34.2+k3s1)
- Add guidance on version format for users
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>