updating documentation

This commit is contained in:
2026-01-08 17:48:49 +01:00
parent a2cf2a86d2
commit 4e0a3cf0cb
4 changed files with 177 additions and 113 deletions
+48 -33
View File
@@ -68,36 +68,36 @@ bash scripts/verify-compute-blade-agent.sh
- [ ] Service status shows "Running" - [ ] Service status shows "Running"
- [ ] Config file exists at `/etc/compute-blade-agent/config.yaml` - [ ] Config file exists at `/etc/compute-blade-agent/config.yaml`
### 3. Manual Verification on a Worker ### 3. Manual Verification on a Master Node
```bash ```bash
ssh pi@192.168.30.102 # Connect to any master (cm4-01, cm4-02, or cm4-03)
sudo systemctl status compute-blade-agent ssh pi@192.168.30.101
kubectl get nodes
``` ```
- [ ] Service is active (running) - [ ] All 3 masters show as "Ready"
- [ ] Service is enabled (will start on boot) - [ ] Worker node (cm4-04) shows as "Ready"
### 4. Check Logs ### 4. Check Etcd Quorum
```bash ```bash
ssh pi@192.168.30.102 ssh pi@192.168.30.101
sudo journalctl -u compute-blade-agent -n 50 sudo /var/lib/rancher/k3s/data/*/bin/etcdctl member list
``` ```
- [ ] No error messages - [ ] All 3 etcd members show as active
- [ ] Service started successfully - [ ] Cluster has quorum (2/3 minimum for failover)
- [ ] Hardware detection messages present (if applicable)
### 5. Verify Installation ### 5. Verify Kubeconfig
```bash ```bash
ssh pi@192.168.30.102 export KUBECONFIG=$(pwd)/kubeconfig
/usr/local/bin/compute-blade-agent --version kubectl config get-contexts
``` ```
- [ ] Binary responds with version information - [ ] Shows contexts: cm4-01, cm4-02, cm4-03, and default
- [ ] bladectl CLI tool is available - [ ] All contexts point to correct control-plane nodes
## Optional: Kubernetes Monitoring Setup ## Optional: Kubernetes Monitoring Setup
@@ -159,15 +159,20 @@ enable_compute_blade_agent=true # or false
### Per-Node Configuration ### Per-Node Configuration
To enable/disable specific nodes, edit `inventory/hosts.ini`: Note: cm4-02 and cm4-03 are now **master nodes**, not workers. To enable/disable compute-blade-agent on specific nodes:
```ini ```ini
[master]
cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true enable_compute_blade_agent=false
cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false enable_compute_blade_agent=false
cm4-03 ansible_host=192.168.30.103 ansible_user=pi k3s_server_init=false enable_compute_blade_agent=false
[worker] [worker]
cm4-02 ansible_host=... enable_compute_blade_agent=false cm4-04 ansible_host=192.168.30.104 ansible_user=pi enable_compute_blade_agent=true
cm4-03 ansible_host=... enable_compute_blade_agent=true
``` ```
- [ ] Per-node settings configured as needed - [ ] Per-node settings configured as needed
- [ ] Master nodes typically don't need compute-blade-agent
- [ ] Saved inventory file - [ ] Saved inventory file
- [ ] Re-run playbook if changes made - [ ] Re-run playbook if changes made
@@ -214,26 +219,36 @@ ansible worker -m shell -a "systemctl status compute-blade-agent" --become
- [ ] All workers show active status - [ ] All workers show active status
## HA Cluster Maintenance
### Testing Failover
Your 3-node HA cluster can handle one master going down (maintains 2/3 quorum):
```bash
# Reboot one master while monitoring cluster
ssh pi@192.168.30.101
sudo reboot
# From another terminal, watch cluster status
watch kubectl get nodes
```
- [ ] Cluster remains operational with 2/3 masters
- [ ] Pods continue running
- [ ] Can still kubectl from cm4-02 or cm4-03 context
## Uninstall (if needed) ## Uninstall (if needed)
### Uninstall from Single Node ### Uninstall K3s from All Nodes
```bash ```bash
ssh pi@<worker-ip> ansible all -m shell -a "bash /usr/local/bin/k3s-uninstall.sh" --become
sudo bash /usr/local/bin/k3s-uninstall-compute-blade-agent.sh ansible worker -m shell -a "bash /usr/local/bin/k3s-agent-uninstall.sh" --become
``` ```
- [ ] Uninstall script executed - [ ] All K3s services stopped
- [ ] Service removed - [ ] Cluster data cleaned up
- [ ] Configuration cleaned up
### Uninstall from All Workers
```bash
ansible worker -m shell -a "bash /usr/local/bin/k3s-uninstall-compute-blade-agent.sh" --become
```
- [ ] All workers uninstalled
### Disable in Future Deployments ### Disable in Future Deployments
+33 -19
View File
@@ -18,9 +18,9 @@ cat inventory/hosts.ini
Verify: Verify:
- Master node IP is correct (cm4-01) - Master nodes are correct (cm4-01, cm4-02, cm4-03)
- Worker node IPs are correct (cm4-02, cm4-03, cm4-04) - Worker node IP is correct (cm4-04)
- `enable_compute_blade_agent=true` is set - `enable_compute_blade_agent=true` is set (optional for masters)
### Step 2: Test Connectivity ### Step 2: Test Connectivity
@@ -46,17 +46,22 @@ This will:
**Total time**: ~30-45 minutes **Total time**: ~30-45 minutes
### Step 4: Verify ### Step 4: Verify Cluster
```bash ```bash
bash scripts/verify-compute-blade-agent.sh export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
``` ```
All workers should show: You should see all 4 nodes ready (3 masters + 1 worker):
- ✓ Network: Reachable ```bash
- ✓ Service Status: Running NAME STATUS ROLES AGE VERSION
- ✓ Binary: Installed cm4-01 Ready control-plane,etcd,master 5m v1.35.0+k3s1
cm4-02 Ready control-plane,etcd 3m v1.35.0+k3s1
cm4-03 Ready control-plane,etcd 3m v1.35.0+k3s1
cm4-04 Ready <none> 3m v1.35.0+k3s1
```
## Configuration ## Configuration
@@ -215,22 +220,31 @@ sudo systemctl status compute-blade-agent
## Common Tasks ## Common Tasks
### Restart Agent on All Workers ### Check Cluster Status
```bash ```bash
ansible worker -m shell -a "sudo systemctl restart compute-blade-agent" --become export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes
kubectl get pods --all-namespaces
``` ```
### View Agent Logs on All Workers ### Access Any Master Node
```bash ```bash
ansible worker -m shell -a "sudo journalctl -u compute-blade-agent -n 20" --become # Access cm4-01
ssh pi@192.168.30.101
# Or access cm4-02 (backup master)
ssh pi@192.168.30.102
# Or access cm4-03 (backup master)
ssh pi@192.168.30.103
``` ```
### Deploy Only to Specific Nodes ### Deploy Only to Specific Nodes
```bash ```bash
ansible-playbook site.yml --tags compute-blade-agent --limit cm4-02,cm4-03 ansible-playbook site.yml --tags compute-blade-agent --limit cm4-04
``` ```
### Disable Agent for Next Deployment ### Disable Agent for Next Deployment
@@ -257,12 +271,12 @@ ansible worker -m shell -a "bash /usr/local/bin/k3s-uninstall-compute-blade-agen
ansible all -m shell -a "bash /usr/local/bin/k3s-uninstall.sh" --become ansible all -m shell -a "bash /usr/local/bin/k3s-uninstall.sh" --become
``` ```
## Support ## Documentation
- **Quick Reference**: `cat COMPUTE_BLADE_AGENT.md` - **README.md** - Full guide with all configuration options
- **Checklist**: `cat DEPLOYMENT_CHECKLIST.md` - **DEPLOYMENT_CHECKLIST.md** - Step-by-step checklist
- **Full Guide**: `cat README.md` - **COMPUTE_BLADE_AGENT.md** - Quick reference for agent deployment
- **GitHub**: [compute-blade-agent](https://github.com/compute-blade-community/compute-blade-agent) - **MIKROTIK-VIP-SETUP-CUSTOM.md** - Virtual IP failover configuration
## File Locations ## File Locations
+51 -17
View File
@@ -8,16 +8,18 @@ Customized setup guide for your MikroTik RouterOS configuration.
Uplink Network: 192.168.1.0/24 (br-uplink - WAN/External) Uplink Network: 192.168.1.0/24 (br-uplink - WAN/External)
LAB Network: 192.168.30.0/24 (br-lab - K3s Cluster) LAB Network: 192.168.30.0/24 (br-lab - K3s Cluster)
K3s Nodes: K3s Nodes (3-node HA Cluster):
cm4-01: 192.168.30.101 (Master) cm4-01: 192.168.30.101 (Master/Control-Plane)
cm4-02: 192.168.30.102 (Worker) cm4-02: 192.168.30.102 (Master/Control-Plane)
cm4-03: 192.168.30.103 (Worker) cm4-03: 192.168.30.103 (Master/Control-Plane)
cm4-04: 192.168.30.104 (Worker) cm4-04: 192.168.30.104 (Worker)
Virtual IP to Create: Virtual IP to Create:
192.168.30.100/24 (on br-lab bridge) 192.168.30.100/24 (on br-lab bridge - HAProxy or MikroTik failover)
``` ```
**⚠️ Important Note**: The basic NAT rules below will route to cm4-01 only. To achieve true failover in your 3-node HA cluster, activate the health check script (Step 8) so traffic automatically routes to another master if cm4-01 goes down.
## Step 1: Add Virtual IP Address on MikroTik ## Step 1: Add Virtual IP Address on MikroTik
Since your K3s nodes are on the `br-lab` bridge, add the VIP there: Since your K3s nodes are on the `br-lab` bridge, add the VIP there:
@@ -183,9 +185,9 @@ curl http://test.zlor.fi
curl -k https://test.zlor.fi curl -k https://test.zlor.fi
``` ```
## Step 8: Optional - Add Health Check Script ## Step 8: Add Health Check Script (Recommended for HA)
For automatic failover, create a health check script that monitors the master node and updates NAT rules if it goes down. **For automatic failover with your 3-node HA cluster**, create a health check script that monitors the master node and updates NAT rules if it goes down. This ensures traffic automatically routes to cm4-02 or cm4-03 if cm4-01 fails.
### Create Health Check Script ### Create Health Check Script
@@ -237,6 +239,8 @@ For automatic failover, create a health check script that monitors the master no
comment="Monitor K3s cluster and update VIP routes" comment="Monitor K3s cluster and update VIP routes"
``` ```
**Status**: This scheduler will run every 30 seconds and automatically switch the VIP NAT rules to an available master if cm4-01 becomes unreachable.
### View Health Check Logs ### View Health Check Logs
```mikrotik ```mikrotik
@@ -247,14 +251,33 @@ For automatic failover, create a health check script that monitors the master no
## Verification Checklist ## Verification Checklist
- [ ] VIP address (192.168.30.100) added to br-lab - [ ] VIP address (192.168.30.100) added to br-lab
- [ ] NAT rules for port 80 and 443 created - [ ] NAT rules for port 80 and 443 created (routed to cm4-01)
- [ ] Firewall rules allow traffic to VIP - [ ] Firewall rules allow traffic to VIP
- [ ] Ping 192.168.30.100 succeeds - [ ] Ping 192.168.30.100 succeeds
- [ ] curl http://192.168.30.100 returns nginx page - [ ] curl http://192.168.30.100 returns nginx page
- [ ] DNS A record added: test.zlor.fi → 192.168.30.100 - [ ] DNS A record added: test.zlor.fi → 192.168.30.100
- [ ] curl http://test.zlor.fi works - [ ] curl http://test.zlor.fi works
- [ ] Health check script created (optional) - [ ] **Health check script created** (recommended for HA failover)
- [ ] Health check scheduled (optional) - [ ] **Health check scheduled** (recommended for HA failover)
- [ ] Test failover by pinging health check scheduler status
## Testing Failover (HA Cluster)
If you've enabled the health check script, you can test automatic failover:
```bash
# From your machine, start monitoring
watch -n 5 'curl -v http://192.168.30.100 2>&1 | grep "200 OK\|Connected"'
# In another terminal, SSH to cm4-01 and reboot it
ssh pi@192.168.30.101
sudo reboot
# Watch the curl output - after ~30 seconds, it should reconnect
# This means the health check script switched traffic to cm4-02 or cm4-03
```
**Expected result**: Traffic stays online during the reboot (except for ~30 second switchover window)
## Troubleshooting ## Troubleshooting
@@ -368,16 +391,27 @@ Your VIP is now configured on MikroTik:
``` ```
External Traffic External Traffic
192.168.30.100:80 (VIP on br-lab) 192.168.30.100:80/443 (VIP on br-lab)
NAT Rule Routes to 192.168.30.101:80 NAT Rule Routes to 192.168.30.101:80/443 (cm4-01 Master)
K3s Master Node (cm4-01) If Health Check Enabled:
- Routes to cm4-02 if cm4-01 down (every 30 seconds check)
- Routes to cm4-03 if both cm4-01 and cm4-02 down
If Master Down → Failover to Worker Ingress → K3s Service → Pods
(Optional with health check script)
``` ```
DNS: `test.zlor.fi → 192.168.30.100` **DNS**: `test.zlor.fi → 192.168.30.100`
Single IP for your entire cluster with automatic failover! ✅ **Status**:
- ✅ Single IP for entire cluster
- ✅ Automatic failover (with health check script)
- ✅ 3-node HA masters provide etcd quorum
**Next Steps**:
1. Enable health check script (Step 8) for automatic failover
2. Test failover by rebooting cm4-01 and monitoring connectivity
3. Your cluster now has true high availability!
+45 -44
View File
@@ -42,19 +42,19 @@ Edit `inventory/hosts.ini` and add your Raspberry Pi nodes:
```ini ```ini
[master] [master]
pi-master ansible_host=192.168.30.100 ansible_user=pi cm4-01 ansible_host=192.168.30.101 ansible_user=pi k3s_server_init=true
cm4-02 ansible_host=192.168.30.102 ansible_user=pi k3s_server_init=false
cm4-03 ansible_host=192.168.30.103 ansible_user=pi k3s_server_init=false
[worker] [worker]
pi-worker-1 ansible_host=192.168.30.102 ansible_user=pi cm4-04 ansible_host=192.168.30.104 ansible_user=pi
pi-worker-2 ansible_host=192.168.30.103 ansible_user=pi
pi-worker-3 ansible_host=192.168.30.104 ansible_user=pi
``` ```
### 2. Configure Variables ### 2. Configure Variables
In `inventory/hosts.ini`, you can customize: In `inventory/hosts.ini`, you can customize:
- `k3s_version`: K3s version to install (default: v1.34.2+k3s1) - `k3s_version`: K3s version to install (default: v1.35.0+k3s1)
- `extra_server_args`: Additional arguments for k3s server - `extra_server_args`: Additional arguments for k3s server
- `extra_agent_args`: Additional arguments for k3s agent - `extra_agent_args`: Additional arguments for k3s agent
- `extra_packages`: List of additional packages to install on all nodes - `extra_packages`: List of additional packages to install on all nodes
@@ -304,20 +304,21 @@ kubectl get nodes
You should see all your nodes in Ready state: You should see all your nodes in Ready state:
```bash ```bash
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
pi-master Ready control-plane,master 5m v1.34.2+k3s1 cm4-01 Ready control-plane,etcd,master 5m v1.35.0+k3s1
pi-worker-1 Ready <none> 3m v1.34.2+k3s1 cm4-02 Ready control-plane,etcd 3m v1.35.0+k3s1
pi-worker-2 Ready <none> 3m v1.34.2+k3s1 cm4-03 Ready control-plane,etcd 3m v1.35.0+k3s1
cm4-04 Ready <none> 3m v1.35.0+k3s1
``` ```
## Accessing the Cluster ## Accessing the Cluster
### From Master Node ### From Master Node
SSH into the master node and use kubectl: SSH into a master node and use kubectl:
```bash ```bash
ssh pi@pi-master ssh pi@192.168.30.101
kubectl get nodes kubectl get nodes
``` ```
@@ -461,8 +462,11 @@ nginx-test-7d8f4c9b6d-xr5wp 1/1 Running 0 1m pi-worker-2
Add your master node IP to /etc/hosts: Add your master node IP to /etc/hosts:
```bash ```bash
# Replace 192.168.30.101 with your master node IP # Replace with any master or worker node IP
192.168.30.101 nginx-test.local nginx.pi.local 192.168.30.101 nginx-test.local nginx.pi.local
192.168.30.102 nginx-test.local nginx.pi.local
192.168.30.103 nginx-test.local nginx.pi.local
192.168.30.104 nginx-test.local nginx.pi.local
``` ```
Then access via browser: Then access via browser:
@@ -473,8 +477,9 @@ Then access via browser:
Or test with curl: Or test with curl:
```bash ```bash
# Replace with your master node IP # Test with any cluster node IP (master or worker)
curl -H "Host: nginx-test.local" http://192.168.30.101 curl -H "Host: nginx-test.local" http://192.168.30.101
curl -H "Host: nginx-test.local" http://192.168.30.102
``` ```
### Scale the Deployment ### Scale the Deployment
@@ -624,7 +629,7 @@ ansible-playbook site.yml --tags k3s-server --limit <failed-master>
### Demoting a Master to Worker ### Demoting a Master to Worker
To remove a master from control-plane and make it a worker: To remove a master from control-plane and make it a worker (note: this reduces HA from 3-node to 2-node):
1. Edit `inventory/hosts.ini`: 1. Edit `inventory/hosts.ini`:
@@ -638,6 +643,8 @@ To remove a master from control-plane and make it a worker:
cm4-04 ansible_host=192.168.30.104 ansible_user=pi cm4-04 ansible_host=192.168.30.104 ansible_user=pi
``` ```
**Warning**: This reduces your cluster to 2 master nodes. With only 2 masters, you lose quorum (require 2/3, have only 1/2 if one fails).
2. Drain the node: 2. Drain the node:
```bash ```bash
@@ -690,7 +697,7 @@ To update to a specific k3s version:
```ini ```ini
[k3s_cluster:vars] [k3s_cluster:vars]
k3s_version=v1.35.0+k3s1 k3s_version=v1.36.0+k3s1
``` ```
1. Run the k3s playbook to update all nodes: 1. Run the k3s playbook to update all nodes:
@@ -711,7 +718,7 @@ For more control, you can manually update k3s on individual nodes:
ssh pi@<node-ip> ssh pi@<node-ip>
# Download and install specific version # Download and install specific version
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.0+k3s1 sh - curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.36.0+k3s1 sh -
# Restart k3s # Restart k3s
sudo systemctl restart k3s # On master sudo systemctl restart k3s # On master
@@ -775,7 +782,7 @@ If an update causes issues, you can rollback to a previous version:
```bash ```bash
# Update inventory with previous version # Update inventory with previous version
# [k3s_cluster:vars] # [k3s_cluster:vars]
# k3s_version=v1.34.2+k3s1 # k3s_version=v1.35.0+k3s1
# Re-run the playbook # Re-run the playbook
ansible-playbook site.yml --tags k3s-server,k3s-agent ansible-playbook site.yml --tags k3s-server,k3s-agent
@@ -814,7 +821,7 @@ ansible-playbook reboot.yml --limit master
### Reboot a Specific Node ### Reboot a Specific Node
```bash ```bash
ansible-playbook reboot.yml --limit pi-worker-1 ansible-playbook reboot.yml --limit cm4-04
``` ```
## Troubleshooting ## Troubleshooting
@@ -1001,26 +1008,33 @@ ansible-playbook site.yml --tags compute-blade-agent
## External DNS Configuration ## External DNS Configuration
To use external domains (like `test.zlor.fi`) with your k3s cluster ingress, you need to configure DNS and update your nodes. To use external domains (like `test.zlor.fi`) with your k3s cluster ingress, you need to configure DNS. Your cluster uses a Virtual IP (192.168.30.100) via MikroTik for high availability.
### Step 1: Configure DNS Server Records ### Step 1: Configure DNS Server Records
On your DNS server, add **A records** pointing to your k3s cluster nodes: On your DNS server, add **A records** pointing to your k3s cluster nodes:
#### Option A: Single Record (Master Node Only) - Simplest #### Option A: Virtual IP (VIP) via MikroTik - Recommended for HA
If your DNS only allows one A record: Use your MikroTik router's Virtual IP (192.168.30.100) for high availability:
```dns ```dns
test.zlor.fi A 192.168.30.101 test.zlor.fi A 192.168.30.100
``` ```
**Pros:** Simple, works with any DNS server **Pros:**
**Cons:** No failover if master node is down
#### Option B: Multiple Records (Load Balanced) - Best Redundancy - Single IP for entire cluster
- Hardware-based failover (more reliable)
- Better performance
- No additional software needed
- Automatically routes to available masters
If your DNS supports multiple A records: See [MIKROTIK-VIP-SETUP-CUSTOM.md](MIKROTIK-VIP-SETUP-CUSTOM.md) for detailed setup instructions.
#### Option B: Multiple Records (Load Balanced)
If your DNS supports multiple A records, point to all cluster nodes:
```dns ```dns
test.zlor.fi A 192.168.30.101 test.zlor.fi A 192.168.30.101
@@ -1029,32 +1043,19 @@ test.zlor.fi A 192.168.30.103
test.zlor.fi A 192.168.30.104 test.zlor.fi A 192.168.30.104
``` ```
DNS clients will distribute requests across all nodes (round-robin).
**Pros:** Load balanced, automatic failover **Pros:** Load balanced, automatic failover
**Cons:** Requires DNS server support for multiple A records **Cons:** Requires DNS server support for multiple A records
#### Option C: Virtual IP (VIP) - Best of Both Worlds #### Option C: Single Master Node (No Failover)
If your DNS only allows one A record but you want redundancy: For simple setups without redundancy:
```dns ```dns
test.zlor.fi A 192.168.30.100 test.zlor.fi A 192.168.30.101
``` ```
Set up a virtual IP that automatically handles failover. You have two sub-options: **Pros:** Simple, works with any DNS server
**Cons:** No failover if that node is down (not recommended for HA clusters)
##### Option C: MikroTik VIP (Recommended)
Configure VIP directly on your MikroTik router. See [MIKROTIK-VIP-SETUP.md](MIKROTIK-VIP-SETUP.md) for customized setup instructions for your network topology.
Pros:
- Simple setup (5 minutes)
- No additional software on cluster nodes
- Hardware-based failover (more reliable)
- Better performance
- Reduced CPU overhead on nodes
### Step 2: Configure Cluster Nodes for External DNS ### Step 2: Configure Cluster Nodes for External DNS