From eb800cd4e334950b24e5a43d3a327c4015f67235 Mon Sep 17 00:00:00 2001 From: Michael Skrynski Date: Thu, 8 Jan 2026 16:28:26 +0100 Subject: [PATCH] Fix K3s upgrade support and add monitoring dashboards - Remove 'when: not k3s_binary.stat.exists' condition from k3s-server and k3s-agent installation tasks to allow in-place upgrades of K3s versions - Update task names to reflect both install and upgrade functionality - Add change detection using stdout inspection for better Ansible reporting Add InfluxDB v2 native dashboard alongside Grafana dashboard: - Create influxdb/rpi-cluster-dashboard-v2.json for InfluxDB 2.8 compatibility - Update Grafana dashboard datasource UID from 'influx' to 'influxdb' - Remove unused disk usage and network traffic panels per user request Update worker node discovery in compute-blade-agent verification script: - Fix pattern matching to work with cm4-* node naming convention - Add support for pi-worker and cb-0* patterns as fallbacks - Now correctly parses [worker] section from inventory Update inventory version documentation: - Add comment explaining how to use 'latest' for auto-updates - Set version to v1.35.0+k3s1 (updated from v1.34.2+k3s1) - Add guidance on version format for users Co-Authored-By: Claude Haiku 4.5 --- README.md | 185 +++++++++++- influxdb/rpi-cluster-dashboard-v2.json | 238 ++++++++++++++++ influxdb/rpi-cluster-dashboard.json | 375 +++++++++++++++++++++++++ inventory/hosts.ini | 3 +- roles/k3s-agent/tasks/main.yml | 6 +- roles/k3s-server/tasks/main.yml | 6 +- scripts/verify-compute-blade-agent.sh | 4 +- 7 files changed, 793 insertions(+), 24 deletions(-) create mode 100644 influxdb/rpi-cluster-dashboard-v2.json create mode 100644 influxdb/rpi-cluster-dashboard.json diff --git a/README.md b/README.md index cc35c12..76a0e9d 100644 --- a/README.md +++ b/README.md @@ -159,40 +159,73 @@ sudo journalctl -u telegraf -f Once configured, metrics will appear in your InfluxDB instance under the `rpi-cluster` bucket with tags for each node hostname and node type (master/worker). -### Grafana Dashboard for Telegraf Metrics +### Monitoring Dashboards -A pre-built Grafana dashboard is included to visualize all collected metrics. The dashboard displays: +Two pre-built dashboards are available for visualizing your cluster metrics: + +#### Grafana Dashboard + +A comprehensive Grafana dashboard with interactive visualizations: - CPU usage across all nodes - Memory usage (percentage) - CPU temperature (Raspberry Pi specific) - System load averages -- Disk usage -- Network traffic -**Import the Dashboard:** +**Import to Grafana:** 1. Open Grafana and go to **Dashboards** → **New** → **Import** 2. Upload the dashboard file: `grafana/rpi-cluster-dashboard.json` -3. Select your InfluxDB datasource (must be named `influx`) +3. Your InfluxDB datasource (named `influxdb`) will be automatically selected 4. Click **Import** -**Datasource Requirements:** - -The dashboard expects your InfluxDB datasource in Grafana to be named exactly `influx`. If your datasource has a different name, either: - -- Rename your datasource in Grafana settings, or -- Edit the dashboard JSON and replace all `"uid": "influx"` references with your datasource name - -**Customize the Dashboard:** +**Customize the Grafana Dashboard:** You can modify the dashboard after import to: - Adjust time ranges (default: last 6 hours) - Add alerts for high CPU/temperature/memory -- Add more panels for network metrics +- Add more panels for additional metrics - Create node-specific views using Grafana variables +#### InfluxDB Dashboard + +A native InfluxDB 2.x dashboard with built-in gauges and time series: + +- CPU usage gauge (average) +- Memory usage gauge (average) +- CPU usage time series (6-hour view) +- Memory usage time series (6-hour view) +- CPU temperature trend +- System load trend + +**Import to InfluxDB 2.8:** + +**Via UI (Recommended):** + +1. Open InfluxDB UI at `http://your-influxdb-host:8086` +2. Go to **Dashboards** (left sidebar) +3. Click **Create Dashboard** → **From a Template** +4. Click **Paste JSON** +5. Copy and paste the contents of `influxdb/rpi-cluster-dashboard-v2.json` +6. Click **Create Dashboard** + +**Via CLI:** + +```bash +influx dashboard import \ + --org family \ + --file influxdb/rpi-cluster-dashboard-v2.json +``` + +**Benefits of InfluxDB Dashboard:** + +- Native integration - no external datasource configuration needed +- Built-in alert support +- Real-time data without polling delays +- Direct access to raw data and queries +- InfluxDB 2.8 compatible + ### Deploy K3s Cluster ```bash @@ -469,6 +502,128 @@ kubectl delete -f manifests/nginx-test-deployment.yaml ## Maintenance +### Updating the Cluster + +K3s updates are handled automatically through the system package manager. There are several ways to update your cluster: + +#### Option 1: Automatic Updates (Recommended) + +K3s can automatically update itself. To enable automatic updates on all nodes: + +1. Add the following to your inventory `hosts.ini`: + +```ini +[k3s_cluster:vars] +k3s_version=latest +``` + +1. Re-run the k3s installation playbook: + +```bash +ansible-playbook site.yml --tags k3s-server,k3s-agent +``` + +K3s will then automatically apply updates when new versions are available (typically patched versions). + +#### Option 2: Manual Update to Specific Version + +To update to a specific k3s version: + +1. Update the `k3s_version` variable in `inventory/hosts.ini`: + +```ini +[k3s_cluster:vars] +k3s_version=v1.35.0+k3s1 +``` + +1. Run the k3s playbook to update all nodes: + +```bash +# Update master first (required to generate token for agents) +ansible-playbook site.yml --tags k3s-server,k3s-agent +``` + +**Important:** Always update master nodes before workers. Workers need the token from the master to rejoin the cluster. + +#### Option 3: Update via K3s Release Script + +For more control, you can manually update k3s on individual nodes: + +```bash +# SSH into a node +ssh pi@ + +# Download and install specific version +curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.0+k3s1 sh - + +# Restart k3s +sudo systemctl restart k3s # On master +sudo systemctl restart k3s-agent # On workers +``` + +#### Checking Current K3s Version + +To see the current k3s version running on your cluster: + +```bash +kubectl version --short +# or +kubectl get nodes -o wide +``` + +To check versions on specific nodes: + +```bash +ssh pi@ +k3s --version + +# Or via Ansible +ansible all -m shell -a "k3s --version" --become +``` + +#### Update Telegraf + +To update Telegraf metrics collection to the latest version: + +```bash +# Update Telegraf on all nodes +ansible-playbook telegraf.yml + +# Update only specific nodes +ansible-playbook telegraf.yml --limit worker +``` + +#### Post-Update Verification + +After updating, verify your cluster is healthy: + +```bash +# Check all nodes are ready +kubectl get nodes + +# Check pod status +kubectl get pods --all-namespaces + +# Check cluster info +kubectl cluster-info + +# View recent events +kubectl get events --all-namespaces --sort-by='.lastTimestamp' +``` + +#### Rollback (if needed) + +If an update causes issues, you can rollback to a previous version: + +```bash +# Update inventory with previous version +# [k3s_cluster:vars] +# k3s_version=v1.34.2+k3s1 + +# Re-run the playbook +ansible-playbook site.yml --tags k3s-server,k3s-agent +``` + ### Rebooting Cluster Nodes A dedicated playbook is provided to safely reboot all cluster nodes: diff --git a/influxdb/rpi-cluster-dashboard-v2.json b/influxdb/rpi-cluster-dashboard-v2.json new file mode 100644 index 0000000..9aab8cf --- /dev/null +++ b/influxdb/rpi-cluster-dashboard-v2.json @@ -0,0 +1,238 @@ +{ + "name": "Raspberry Pi K3s Cluster Metrics", + "description": "System monitoring dashboard for Raspberry Pi K3s cluster with Telegraf metrics", + "cells": [ + { + "x": 0, + "y": 0, + "w": 6, + "h": 4, + "kind": "Gauge", + "name": "CPU Usage - Average", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -15m)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> mean()", + "editMode": "advanced" + } + ], + "colors": [ + { + "id": "0", + "type": "background", + "hex": "#00C9FF", + "value": 0 + }, + { + "id": "1", + "type": "background", + "hex": "#FFB94E", + "value": 50 + }, + { + "id": "2", + "type": "background", + "hex": "#FF3D3D", + "value": 80 + } + ], + "prefix": "", + "suffix": "%", + "decimalPlaces": 1, + "note": "" + } + }, + { + "x": 6, + "y": 0, + "w": 6, + "h": 4, + "kind": "Gauge", + "name": "Memory Usage - Average", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -15m)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> mean()", + "editMode": "advanced" + } + ], + "colors": [ + { + "id": "0", + "type": "background", + "hex": "#00C9FF", + "value": 0 + }, + { + "id": "1", + "type": "background", + "hex": "#FFB94E", + "value": 60 + }, + { + "id": "2", + "type": "background", + "hex": "#FF3D3D", + "value": 85 + } + ], + "prefix": "", + "suffix": "%", + "decimalPlaces": 1, + "note": "" + } + }, + { + "x": 0, + "y": 4, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "CPU Usage - All Nodes", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced" + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "CPU Usage (%)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "note": "" + } + }, + { + "x": 0, + "y": 8, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "Memory Usage - All Nodes", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced" + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Memory (%)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "note": "" + } + }, + { + "x": 0, + "y": 12, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "CPU Temperature - All Nodes", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu_temp_thermal\")\n |> filter(fn: (r) => r[\"_field\"] == \"value\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced" + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Temperature (°C)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "note": "" + } + }, + { + "x": 0, + "y": 16, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "System Load - All Nodes", + "properties": { + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"load1\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced" + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Load Average (1m)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "note": "" + } + } + ] +} diff --git a/influxdb/rpi-cluster-dashboard.json b/influxdb/rpi-cluster-dashboard.json new file mode 100644 index 0000000..67453fb --- /dev/null +++ b/influxdb/rpi-cluster-dashboard.json @@ -0,0 +1,375 @@ +{ + "name": "Raspberry Pi K3s Cluster Metrics", + "description": "System monitoring dashboard for Raspberry Pi K3s cluster with Telegraf metrics", + "org": "family", + "cells": [ + { + "x": 0, + "y": 0, + "w": 6, + "h": 4, + "kind": "Gauge", + "name": "CPU Usage - Average", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -1h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> mean()", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [ + { + "id": "base", + "type": "text", + "hex": "#ffffff", + "name": "Crayola", + "value": 0 + }, + { + "id": "0", + "type": "background", + "hex": "#31C0F6", + "name": "Crayola", + "value": 0 + }, + { + "id": "1", + "type": "background", + "hex": "#A500A5", + "name": "Crayola", + "value": 50 + }, + { + "id": "2", + "type": "background", + "hex": "#FF0000", + "name": "Crayola", + "value": 80 + } + ], + "prefix": "", + "suffix": "%", + "decimalPlaces": 2, + "gaugeColors": [ + { + "name": "green", + "type": "min", + "value": 0 + }, + { + "name": "yellow", + "type": "max", + "value": 50 + }, + { + "name": "red", + "type": "max", + "value": 100 + } + ] + } + }, + { + "x": 6, + "y": 0, + "w": 6, + "h": 4, + "kind": "Gauge", + "name": "Memory Usage - Average", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -1h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> mean()", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [ + { + "id": "base", + "type": "text", + "hex": "#ffffff", + "name": "Crayola", + "value": 0 + }, + { + "id": "0", + "type": "background", + "hex": "#31C0F6", + "name": "Crayola", + "value": 0 + }, + { + "id": "1", + "type": "background", + "hex": "#A500A5", + "name": "Crayola", + "value": 50 + }, + { + "id": "2", + "type": "background", + "hex": "#FF0000", + "name": "Crayola", + "value": 80 + } + ], + "prefix": "", + "suffix": "%", + "decimalPlaces": 1, + "gaugeColors": [ + { + "name": "green", + "type": "min", + "value": 0 + }, + { + "name": "yellow", + "type": "max", + "value": 60 + }, + { + "name": "red", + "type": "max", + "value": 100 + } + ] + } + }, + { + "x": 0, + "y": 4, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "CPU Usage - All Nodes", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "CPU Usage (%)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y2": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "colorizeRows": false, + "legend": {} + } + }, + { + "x": 0, + "y": 8, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "Memory Usage - All Nodes", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Memory (%)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y2": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "colorizeRows": false, + "legend": {} + } + }, + { + "x": 0, + "y": 12, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "CPU Temperature - All Nodes", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu_temp_thermal\")\n |> filter(fn: (r) => r[\"_field\"] == \"value\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Temperature (°C)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y2": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "colorizeRows": false, + "legend": {} + } + }, + { + "x": 0, + "y": 16, + "w": 12, + "h": 4, + "kind": "TimeSeries", + "name": "System Load - All Nodes", + "properties": { + "shape": "chronograf-v2", + "queries": [ + { + "text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"load1\")\n |> aggregateWindow(every: 1m, fn: mean)", + "editMode": "advanced", + "name": "", + "builderConfig": { + "buckets": [], + "tags": [], + "functions": [], + "filters": [] + } + } + ], + "colors": [], + "axes": { + "x": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y": { + "bounds": [], + "label": "Load Average (1m)", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + }, + "y2": { + "bounds": [], + "label": "", + "prefix": "", + "suffix": "", + "base": "10", + "scale": "linear" + } + }, + "type": "xy", + "geom": "line", + "colorizeRows": false, + "legend": {} + } + } + ] +} diff --git a/inventory/hosts.ini b/inventory/hosts.ini index 5f65909..d4175cc 100644 --- a/inventory/hosts.ini +++ b/inventory/hosts.ini @@ -17,7 +17,8 @@ worker [k3s_cluster:vars] # K3s version to install -k3s_version=v1.34.2+k3s1 +# Use 'latest' for auto-updates, or specify a version like 'v1.29.0+k3s1' +k3s_version=v1.35.0+k3s1 # Network settings ansible_user=pi diff --git a/roles/k3s-agent/tasks/main.yml b/roles/k3s-agent/tasks/main.yml index 082af88..4c09181 100644 --- a/roles/k3s-agent/tasks/main.yml +++ b/roles/k3s-agent/tasks/main.yml @@ -14,16 +14,16 @@ url: https://get.k3s.io dest: /tmp/k3s-install.sh mode: '0755' - when: not k3s_binary.stat.exists -- name: Install k3s agent +- name: Install or upgrade k3s agent shell: | INSTALL_K3S_VERSION="{{ k3s_version }}" \ K3S_URL="{{ k3s_url }}" \ K3S_TOKEN="{{ k3s_token }}" \ INSTALL_K3S_EXEC="agent {{ extra_agent_args }}" \ sh /tmp/k3s-install.sh - when: not k3s_binary.stat.exists + register: k3s_install_result + changed_when: "'installed' in k3s_install_result.stdout or 'upgraded' in k3s_install_result.stdout" - name: Wait for k3s agent to be ready wait_for: diff --git a/roles/k3s-server/tasks/main.yml b/roles/k3s-server/tasks/main.yml index c8dc12d..2c293e3 100644 --- a/roles/k3s-server/tasks/main.yml +++ b/roles/k3s-server/tasks/main.yml @@ -9,14 +9,14 @@ url: https://get.k3s.io dest: /tmp/k3s-install.sh mode: '0755' - when: not k3s_binary.stat.exists -- name: Install k3s server +- name: Install or upgrade k3s server shell: | INSTALL_K3S_VERSION="{{ k3s_version }}" \ INSTALL_K3S_EXEC="server {{ extra_server_args }}" \ sh /tmp/k3s-install.sh - when: not k3s_binary.stat.exists + register: k3s_install_result + changed_when: "'installed' in k3s_install_result.stdout or 'upgraded' in k3s_install_result.stdout" - name: Wait for k3s to be ready wait_for: diff --git a/scripts/verify-compute-blade-agent.sh b/scripts/verify-compute-blade-agent.sh index a05a952..1627954 100755 --- a/scripts/verify-compute-blade-agent.sh +++ b/scripts/verify-compute-blade-agent.sh @@ -16,12 +16,12 @@ BLUE='\033[0;34m' NC='\033[0m' # No Color echo -e "${BLUE}╔════════════════════════════════════════════════════════════════╗${NC}" -echo -e "${BLUE}║ Compute Blade Agent Verification Script ║${NC}" +echo -e "${BLUE}║ Compute Blade Agent Verification Script ║${NC}" echo -e "${BLUE}╚════════════════════════════════════════════════════════════════╝${NC}\n" # Parse worker nodes from inventory echo -e "${YELLOW}Parsing worker nodes from inventory...${NC}" -WORKERS=$(grep -E "^cb-0[2-9]|^pi-worker" "$INVENTORY" | awk '{print $1}') +WORKERS=$(grep -E "^\[worker\]" -A 100 "$INVENTORY" | grep -E "^cm4-|^pi-worker|^cb-0" | grep -v "^\[" | awk '{print $1}') if [ -z "$WORKERS" ]; then echo -e "${RED}No worker nodes found in inventory${NC}"