34 Commits

Author SHA1 Message Date
e1392b1584 Move test deployment manifests to the nginx-test namespace
- Update ConfigMap `nginx-test-html` to namespace `nginx-test`
- Move Deployment `nginx-test` to `nginx-test` namespace
- Move Service `nginx-test` to `nginx-test` namespace
- Move IngressRoute `nginx-test` to `nginx-test` namespace
2026-03-25 11:54:07 +01:00
14d4f2528d Add automatic TLS via Let's Encrypt Cloudflare DNS-01 and Vaultwarden
- Introduce Traefik ACME configuration using Cloudflare DNS-01 challenge
- Deploy Vaultwarden password manager with IP allowlist protection
- Add middleware for security headers, compression, and rate limiting
- Update IngressRoute resources to use new ACME resolver
- Add troubleshooting steps for certificate and TLS issues
- Include test application deployment and verification commands
2026-03-25 11:21:01 +01:00
ff78968b74 Add .env setup instructions to README 2026-02-27 07:30:37 +01:00
c59f44231a removing rust cli client 2026-02-27 07:25:34 +01:00
3e93903ea5 Update inventory and remove multi-ssh script
- Add cm5-01 as a worker node in inventory
- Update k3s_version to v1.35.1+k3s1
- Remove the multi-ssh.sh helper script
2026-02-15 16:19:10 +01:00
2237a6fb95 Revamp docs and add Traefik Basic Auth guide 2026-01-14 14:00:13 +01:00
6eca87bfa9 Add default-backend and enable compute-blade-agent 2026-01-12 12:53:47 +01:00
fe5883f519 Read k3s master token from primary master
- Obtain the token from the primary master via slurp and base64 decode
- Derive k3s_url from the primary master's ansible_host
- Use the decoded content as k3s_token
- Update the success message quoting
2026-01-12 12:08:57 +01:00
813ee0c252 Add uninstall-k3s.yml and fix server token flow 2026-01-12 12:04:26 +01:00
f3754c01d7 Add Prometheus Operator role and templates 2026-01-12 09:27:57 +01:00
fd7c9239b5 Update docs and roles for agent on all nodes
- Switch compute-blade-agent deployment from workers to all nodes
  (control-plane and workers)
- Use /usr/bin/compute-blade-agent instead of /usr/local/bin
- Update verification scripts to reference /usr/bin/compute-blade-agent
- Update docs to refer to all nodes across Deployment Guide, Checklist,
  and Getting Started
- Change site.yml to install on all hosts instead of just workers
- Align example commands to the all-nodes workflow
2026-01-12 08:54:41 +01:00
2beb6aadfe Improve MikroTik API parsing and enable CM4 UART
- Refactor MikroTik API parsing to robustly extract name and poe-out and
  add debug logs
- Update boot config on CM4: ensure [cm4] exists, enable_uart=1, and
  apply uart5 overlay
2026-01-12 08:12:48 +01:00
02aba68541 Add MikroTik PoE control Rust CLI
- Introduces a Rust-based MikroTik PoE control CLI and library
- Includes a MikroTik RouterOS API client with basic commands
- Exposes on/off/status via scriptRun and PoE status query
- Provides .env.example, README, and a minimal project layout for the
  mikrotik_cli crate
2026-01-10 22:12:25 +01:00
4e0a3cf0cb updating documentation 2026-01-09 16:07:17 +01:00
a2cf2a86d2 Message 2026-01-09 16:02:30 +01:00
61d7a5bf78 Configure multi-master HA cluster with 3 control-plane nodes
This change transforms the cluster from single-master to a fully redundant
3-node control-plane setup for high availability.

Changes:
- Updated inventory/hosts.ini to promote cm4-02 and cm4-03 to master group
  * Added k3s_server_init flag to distinguish primary (true) from joining (false) masters
  * Reduced worker nodes from 3 to 1 (cm4-04)
  * Added clear comments explaining the HA setup

- Modified roles/k3s-server/tasks/main.yml for multi-master support
  * Separated primary master initialization from additional master joining
  * Primary master (k3s_server_init=true) initializes cluster and generates token
  * Additional masters (k3s_server_init=false) join using primary's token
  * Proper sequencing ensures cluster stability during joining
  * Common tasks (kubeconfig setup) run on all masters

- Updated site.yml for proper master deployment sequencing
  * Primary master deploys first and initializes cluster
  * Additional masters deploy serially (one at a time) for stability
  * Serial deployment prevents etcd consensus issues during joining
  * Workers join only after all masters are ready
  * Test deployments run on primary master only

- Added comprehensive "High Availability - Multi-Master Cluster" section to README
  * Explains benefits of multi-master setup
  * Documents how to promote nodes to master
  * Includes monitoring and failover procedures
  * Shows how to recover from failed masters
  * Explains demoting masters back to workers

Benefits:
✓ No single point of failure in control-plane
✓ Automatic etcd clustering across 3 nodes
✓ Can maintain master updates with 0 downtime
✓ Faster cluster recovery from node failures
✓ Better performance distribution for API server
✓ Works seamlessly with MikroTik VIP for external access

Deployment Flow:
1. cm4-01 initializes cluster (becomes primary master)
2. cm4-02 joins as control-plane node
3. cm4-03 joins as control-plane node
4. cm4-04 joins as worker node
5. All nodes join etcd cluster with proper quorum

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 17:21:30 +01:00
b6887cedc0 Message 2026-01-08 17:06:48 +01:00
6d4d73d2e8 keep only one VIP solution 2026-01-08 17:03:09 +01:00
b78249e8a1 Remove Keepalived VIP setup - using MikroTik hardware VIP instead
- Deleted vip-setup.yml playbook (Keepalived no longer needed)
- Updated MIKROTIK-VIP-SETUP-CUSTOM.md with corrected MikroTik syntax:
  * Fixed path notation: use spaces not slashes (/ip firewall nat not /ip/firewall/nat/)
  * Fixed action parameter: use dst-nat not dstnat
  * Added web interface alternative for NAT rule configuration
  * Added important syntax notes section
- Removed Keepalived documentation from README.md
- Kept MIKROTIK-VIP-SETUP.md as general reference guide
- Updated DNS and external access section to reference MikroTik VIP only

This simplifies the project by removing software-based VIP complexity since
the hardware-based MikroTik VIP provides better performance with no node overhead.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 17:02:19 +01:00
ed0e0d1af1 Fix MikroTik NAT rule syntax - remove line breaks in commands
- Fix HTTP NAT rule to be single line without breaks
- Fix HTTPS NAT rule to be single line without breaks
- Ensure all commands are copy-paste ready without syntax errors
- User reported: 'syntax error (line 1 column 94)' when running split command

All NAT rules now properly formatted as single-line commands suitable
for copy-paste into MikroTik SSH terminal.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:52:27 +01:00
f8d70c8b1f Add customized MikroTik VIP setup for user's specific configuration
Create MIKROTIK-VIP-SETUP-CUSTOM.md tailored to user's setup:
- br-lab bridge (192.168.30.0/24) for K3s cluster
- br-uplink bridge (192.168.1.0/24) for external/uplink
- 4 CM4 nodes connected to br-lab

Configuration includes:
- VIP address on br-lab bridge (192.168.30.100/24)
- NAT rules for HTTP (port 80) and HTTPS (port 443)
- Static routes for cluster network connectivity
- Firewall rules to allow traffic to VIP
- Health check script for automatic failover
- Complete testing and verification procedures
- Troubleshooting guide specific to this setup

Ready-to-copy command sequences for:
- Initial setup (one command at a time)
- All commands together
- Complete removal if needed

Includes:
- Verification checklist
- Detailed troubleshooting steps
- Health check script with 30-second interval monitoring
- NAT rule automatic updates on master failure

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:49:20 +01:00
079bb4ba77 Add MikroTik VIP setup guide as primary HA solution
Create MIKROTIK-VIP-SETUP.md with comprehensive guide:
- MikroTik Virtual IP configuration (web interface and CLI)
- NAT rule setup for traffic routing
- Health check script for automatic failover
- Comparison with Keepalived approach
- Troubleshooting guide
- Failover testing procedures

Update README.md DNS configuration section:
- Add MikroTik VIP as Option C1 (recommended for MikroTik users)
- Keep Keepalived as Option C2 (for non-MikroTik setups)
- Link to MIKROTIK-VIP-SETUP.md for detailed instructions
- Clear recommendation based on hardware

Benefits of MikroTik VIP over Keepalived:
- Hardware-based failover (more reliable)
- No additional software on cluster nodes
- Simpler setup (5 minutes vs 10 minutes)
- Better performance

Fix markdown linting issues:
- Add proper blank lines around lists
- Use headings instead of emphasis
- Maintain consistent formatting

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:46:21 +01:00
6049509c5d Add Virtual IP (VIP) solution for single DNS record with failover
Create vip-setup.yml Ansible playbook for Keepalived-based VIP configuration
- Automatic failover between cluster nodes using VRRP protocol
- Health checks for API server availability
- Single IP address can be used in DNS instead of multiple A records
- Master node holds VIP by default, workers act as backups

Update README.md with comprehensive VIP documentation:
- Add three DNS options (single record, multiple records, VIP)
- Detailed VIP installation and verification steps
- Monitoring and failover testing procedures
- Troubleshooting guide for common VIP issues
- Instructions for disabling VIP if needed

Benefits:
- Single DNS A record pointing to VIP (192.168.30.100)
- Automatic failover with no manual intervention
- Load balancing capability across all nodes
- Transparent to applications

Fix markdown linting issues:
- Add proper blank lines around lists and code blocks
- Use consistent ordered list numbering (all 1.)
- Remove duplicate/extra blank lines
- Ensure proper spacing around headings

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:44:06 +01:00
0434039b80 Add external DNS configuration guide and update ingress for test.zlor.fi
- Update nginx-test ingress to include test.zlor.fi domain
- Add comprehensive DNS configuration section to README with:
  - DNS A record setup (single and multi-record options)
  - Ansible playbook for automated DNS resolver configuration
  - Manual DNS configuration instructions
  - Ingress verification steps
  - Testing procedures and troubleshooting guide
  - Instructions for adding additional domains
- Fix markdown linting issues (blank lines, language identifiers, list prefixes)

DNS configuration now supports:
- External domain resolution (test.zlor.fi)
- systemd-resolved integration
- Load balancing across cluster nodes
- Multiple domain support

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:40:00 +01:00
eb800cd4e3 Fix K3s upgrade support and add monitoring dashboards
- Remove 'when: not k3s_binary.stat.exists' condition from k3s-server and
  k3s-agent installation tasks to allow in-place upgrades of K3s versions
- Update task names to reflect both install and upgrade functionality
- Add change detection using stdout inspection for better Ansible reporting

Add InfluxDB v2 native dashboard alongside Grafana dashboard:
- Create influxdb/rpi-cluster-dashboard-v2.json for InfluxDB 2.8 compatibility
- Update Grafana dashboard datasource UID from 'influx' to 'influxdb'
- Remove unused disk usage and network traffic panels per user request

Update worker node discovery in compute-blade-agent verification script:
- Fix pattern matching to work with cm4-* node naming convention
- Add support for pi-worker and cb-0* patterns as fallbacks
- Now correctly parses [worker] section from inventory

Update inventory version documentation:
- Add comment explaining how to use 'latest' for auto-updates
- Set version to v1.35.0+k3s1 (updated from v1.34.2+k3s1)
- Add guidance on version format for users

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-08 16:28:26 +01:00
ddf7dd93b5 adding metrics to influx via telegraf 2025-12-18 21:17:17 +01:00
7568a1db92 renaming cluster nodes 2025-11-26 20:32:21 +01:00
8e9f0caf6c fixing linter errors in markdown 2025-11-24 10:30:46 +01:00
fe7d03ce9a adding compute blade specific code 2025-11-24 10:25:03 +01:00
a81cb20228 adding packages to install 2025-10-22 11:34:15 +02:00
eacf3cb5de fix inconsistancies 2025-10-22 10:25:38 +02:00
11aab36289 adding ingress 2025-10-22 08:43:06 +02:00
eb018de309 adding reboot ansible 2025-10-22 08:31:28 +02:00
f311e2ac00 initial commit 2025-10-22 08:20:53 +02:00