16 Commits

Author SHA1 Message Date
14d4f2528d Add automatic TLS via Let's Encrypt Cloudflare DNS-01 and Vaultwarden
- Introduce Traefik ACME configuration using Cloudflare DNS-01 challenge
- Deploy Vaultwarden password manager with IP allowlist protection
- Add middleware for security headers, compression, and rate limiting
- Update IngressRoute resources to use new ACME resolver
- Add troubleshooting steps for certificate and TLS issues
- Include test application deployment and verification commands
2026-03-25 11:21:01 +01:00
6eca87bfa9 Add default-backend and enable compute-blade-agent 2026-01-12 12:53:47 +01:00
fe5883f519 Read k3s master token from primary master
- Obtain the token from the primary master via slurp and base64 decode
- Derive k3s_url from the primary master's ansible_host
- Use the decoded content as k3s_token
- Update the success message quoting
2026-01-12 12:08:57 +01:00
813ee0c252 Add uninstall-k3s.yml and fix server token flow 2026-01-12 12:04:26 +01:00
f3754c01d7 Add Prometheus Operator role and templates 2026-01-12 09:27:57 +01:00
fd7c9239b5 Update docs and roles for agent on all nodes
- Switch compute-blade-agent deployment from workers to all nodes
  (control-plane and workers)
- Use /usr/bin/compute-blade-agent instead of /usr/local/bin
- Update verification scripts to reference /usr/bin/compute-blade-agent
- Update docs to refer to all nodes across Deployment Guide, Checklist,
  and Getting Started
- Change site.yml to install on all hosts instead of just workers
- Align example commands to the all-nodes workflow
2026-01-12 08:54:41 +01:00
2beb6aadfe Improve MikroTik API parsing and enable CM4 UART
- Refactor MikroTik API parsing to robustly extract name and poe-out and
  add debug logs
- Update boot config on CM4: ensure [cm4] exists, enable_uart=1, and
  apply uart5 overlay
2026-01-12 08:12:48 +01:00
a2cf2a86d2 Message 2026-01-09 16:02:30 +01:00
61d7a5bf78 Configure multi-master HA cluster with 3 control-plane nodes
This change transforms the cluster from single-master to a fully redundant
3-node control-plane setup for high availability.

Changes:
- Updated inventory/hosts.ini to promote cm4-02 and cm4-03 to master group
  * Added k3s_server_init flag to distinguish primary (true) from joining (false) masters
  * Reduced worker nodes from 3 to 1 (cm4-04)
  * Added clear comments explaining the HA setup

- Modified roles/k3s-server/tasks/main.yml for multi-master support
  * Separated primary master initialization from additional master joining
  * Primary master (k3s_server_init=true) initializes cluster and generates token
  * Additional masters (k3s_server_init=false) join using primary's token
  * Proper sequencing ensures cluster stability during joining
  * Common tasks (kubeconfig setup) run on all masters

- Updated site.yml for proper master deployment sequencing
  * Primary master deploys first and initializes cluster
  * Additional masters deploy serially (one at a time) for stability
  * Serial deployment prevents etcd consensus issues during joining
  * Workers join only after all masters are ready
  * Test deployments run on primary master only

- Added comprehensive "High Availability - Multi-Master Cluster" section to README
  * Explains benefits of multi-master setup
  * Documents how to promote nodes to master
  * Includes monitoring and failover procedures
  * Shows how to recover from failed masters
  * Explains demoting masters back to workers

Benefits:
✓ No single point of failure in control-plane
✓ Automatic etcd clustering across 3 nodes
✓ Can maintain master updates with 0 downtime
✓ Faster cluster recovery from node failures
✓ Better performance distribution for API server
✓ Works seamlessly with MikroTik VIP for external access

Deployment Flow:
1. cm4-01 initializes cluster (becomes primary master)
2. cm4-02 joins as control-plane node
3. cm4-03 joins as control-plane node
4. cm4-04 joins as worker node
5. All nodes join etcd cluster with proper quorum

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 17:21:30 +01:00
eb800cd4e3 Fix K3s upgrade support and add monitoring dashboards
- Remove 'when: not k3s_binary.stat.exists' condition from k3s-server and
  k3s-agent installation tasks to allow in-place upgrades of K3s versions
- Update task names to reflect both install and upgrade functionality
- Add change detection using stdout inspection for better Ansible reporting

Add InfluxDB v2 native dashboard alongside Grafana dashboard:
- Create influxdb/rpi-cluster-dashboard-v2.json for InfluxDB 2.8 compatibility
- Update Grafana dashboard datasource UID from 'influx' to 'influxdb'
- Remove unused disk usage and network traffic panels per user request

Update worker node discovery in compute-blade-agent verification script:
- Fix pattern matching to work with cm4-* node naming convention
- Add support for pi-worker and cb-0* patterns as fallbacks
- Now correctly parses [worker] section from inventory

Update inventory version documentation:
- Add comment explaining how to use 'latest' for auto-updates
- Set version to v1.35.0+k3s1 (updated from v1.34.2+k3s1)
- Add guidance on version format for users

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:28:26 +01:00
ddf7dd93b5 adding metrics to influx via telegraf 2025-12-18 21:17:17 +01:00
fe7d03ce9a adding compute blade specific code 2025-11-24 10:25:03 +01:00
a81cb20228 adding packages to install 2025-10-22 11:34:15 +02:00
eacf3cb5de fix inconsistancies 2025-10-22 10:25:38 +02:00
11aab36289 adding ingress 2025-10-22 08:43:06 +02:00
f311e2ac00 initial commit 2025-10-22 08:20:53 +02:00