Fix K3s upgrade support and add monitoring dashboards
- Remove 'when: not k3s_binary.stat.exists' condition from k3s-server and k3s-agent installation tasks to allow in-place upgrades of K3s versions - Update task names to reflect both install and upgrade functionality - Add change detection using stdout inspection for better Ansible reporting Add InfluxDB v2 native dashboard alongside Grafana dashboard: - Create influxdb/rpi-cluster-dashboard-v2.json for InfluxDB 2.8 compatibility - Update Grafana dashboard datasource UID from 'influx' to 'influxdb' - Remove unused disk usage and network traffic panels per user request Update worker node discovery in compute-blade-agent verification script: - Fix pattern matching to work with cm4-* node naming convention - Add support for pi-worker and cb-0* patterns as fallbacks - Now correctly parses [worker] section from inventory Update inventory version documentation: - Add comment explaining how to use 'latest' for auto-updates - Set version to v1.35.0+k3s1 (updated from v1.34.2+k3s1) - Add guidance on version format for users Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
185
README.md
185
README.md
@@ -159,40 +159,73 @@ sudo journalctl -u telegraf -f
|
||||
|
||||
Once configured, metrics will appear in your InfluxDB instance under the `rpi-cluster` bucket with tags for each node hostname and node type (master/worker).
|
||||
|
||||
### Grafana Dashboard for Telegraf Metrics
|
||||
### Monitoring Dashboards
|
||||
|
||||
A pre-built Grafana dashboard is included to visualize all collected metrics. The dashboard displays:
|
||||
Two pre-built dashboards are available for visualizing your cluster metrics:
|
||||
|
||||
#### Grafana Dashboard
|
||||
|
||||
A comprehensive Grafana dashboard with interactive visualizations:
|
||||
|
||||
- CPU usage across all nodes
|
||||
- Memory usage (percentage)
|
||||
- CPU temperature (Raspberry Pi specific)
|
||||
- System load averages
|
||||
- Disk usage
|
||||
- Network traffic
|
||||
|
||||
**Import the Dashboard:**
|
||||
**Import to Grafana:**
|
||||
|
||||
1. Open Grafana and go to **Dashboards** → **New** → **Import**
|
||||
2. Upload the dashboard file: `grafana/rpi-cluster-dashboard.json`
|
||||
3. Select your InfluxDB datasource (must be named `influx`)
|
||||
3. Your InfluxDB datasource (named `influxdb`) will be automatically selected
|
||||
4. Click **Import**
|
||||
|
||||
**Datasource Requirements:**
|
||||
|
||||
The dashboard expects your InfluxDB datasource in Grafana to be named exactly `influx`. If your datasource has a different name, either:
|
||||
|
||||
- Rename your datasource in Grafana settings, or
|
||||
- Edit the dashboard JSON and replace all `"uid": "influx"` references with your datasource name
|
||||
|
||||
**Customize the Dashboard:**
|
||||
**Customize the Grafana Dashboard:**
|
||||
|
||||
You can modify the dashboard after import to:
|
||||
|
||||
- Adjust time ranges (default: last 6 hours)
|
||||
- Add alerts for high CPU/temperature/memory
|
||||
- Add more panels for network metrics
|
||||
- Add more panels for additional metrics
|
||||
- Create node-specific views using Grafana variables
|
||||
|
||||
#### InfluxDB Dashboard
|
||||
|
||||
A native InfluxDB 2.x dashboard with built-in gauges and time series:
|
||||
|
||||
- CPU usage gauge (average)
|
||||
- Memory usage gauge (average)
|
||||
- CPU usage time series (6-hour view)
|
||||
- Memory usage time series (6-hour view)
|
||||
- CPU temperature trend
|
||||
- System load trend
|
||||
|
||||
**Import to InfluxDB 2.8:**
|
||||
|
||||
**Via UI (Recommended):**
|
||||
|
||||
1. Open InfluxDB UI at `http://your-influxdb-host:8086`
|
||||
2. Go to **Dashboards** (left sidebar)
|
||||
3. Click **Create Dashboard** → **From a Template**
|
||||
4. Click **Paste JSON**
|
||||
5. Copy and paste the contents of `influxdb/rpi-cluster-dashboard-v2.json`
|
||||
6. Click **Create Dashboard**
|
||||
|
||||
**Via CLI:**
|
||||
|
||||
```bash
|
||||
influx dashboard import \
|
||||
--org family \
|
||||
--file influxdb/rpi-cluster-dashboard-v2.json
|
||||
```
|
||||
|
||||
**Benefits of InfluxDB Dashboard:**
|
||||
|
||||
- Native integration - no external datasource configuration needed
|
||||
- Built-in alert support
|
||||
- Real-time data without polling delays
|
||||
- Direct access to raw data and queries
|
||||
- InfluxDB 2.8 compatible
|
||||
|
||||
### Deploy K3s Cluster
|
||||
|
||||
```bash
|
||||
@@ -469,6 +502,128 @@ kubectl delete -f manifests/nginx-test-deployment.yaml
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Updating the Cluster
|
||||
|
||||
K3s updates are handled automatically through the system package manager. There are several ways to update your cluster:
|
||||
|
||||
#### Option 1: Automatic Updates (Recommended)
|
||||
|
||||
K3s can automatically update itself. To enable automatic updates on all nodes:
|
||||
|
||||
1. Add the following to your inventory `hosts.ini`:
|
||||
|
||||
```ini
|
||||
[k3s_cluster:vars]
|
||||
k3s_version=latest
|
||||
```
|
||||
|
||||
1. Re-run the k3s installation playbook:
|
||||
|
||||
```bash
|
||||
ansible-playbook site.yml --tags k3s-server,k3s-agent
|
||||
```
|
||||
|
||||
K3s will then automatically apply updates when new versions are available (typically patched versions).
|
||||
|
||||
#### Option 2: Manual Update to Specific Version
|
||||
|
||||
To update to a specific k3s version:
|
||||
|
||||
1. Update the `k3s_version` variable in `inventory/hosts.ini`:
|
||||
|
||||
```ini
|
||||
[k3s_cluster:vars]
|
||||
k3s_version=v1.35.0+k3s1
|
||||
```
|
||||
|
||||
1. Run the k3s playbook to update all nodes:
|
||||
|
||||
```bash
|
||||
# Update master first (required to generate token for agents)
|
||||
ansible-playbook site.yml --tags k3s-server,k3s-agent
|
||||
```
|
||||
|
||||
**Important:** Always update master nodes before workers. Workers need the token from the master to rejoin the cluster.
|
||||
|
||||
#### Option 3: Update via K3s Release Script
|
||||
|
||||
For more control, you can manually update k3s on individual nodes:
|
||||
|
||||
```bash
|
||||
# SSH into a node
|
||||
ssh pi@<node-ip>
|
||||
|
||||
# Download and install specific version
|
||||
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.0+k3s1 sh -
|
||||
|
||||
# Restart k3s
|
||||
sudo systemctl restart k3s # On master
|
||||
sudo systemctl restart k3s-agent # On workers
|
||||
```
|
||||
|
||||
#### Checking Current K3s Version
|
||||
|
||||
To see the current k3s version running on your cluster:
|
||||
|
||||
```bash
|
||||
kubectl version --short
|
||||
# or
|
||||
kubectl get nodes -o wide
|
||||
```
|
||||
|
||||
To check versions on specific nodes:
|
||||
|
||||
```bash
|
||||
ssh pi@<node-ip>
|
||||
k3s --version
|
||||
|
||||
# Or via Ansible
|
||||
ansible all -m shell -a "k3s --version" --become
|
||||
```
|
||||
|
||||
#### Update Telegraf
|
||||
|
||||
To update Telegraf metrics collection to the latest version:
|
||||
|
||||
```bash
|
||||
# Update Telegraf on all nodes
|
||||
ansible-playbook telegraf.yml
|
||||
|
||||
# Update only specific nodes
|
||||
ansible-playbook telegraf.yml --limit worker
|
||||
```
|
||||
|
||||
#### Post-Update Verification
|
||||
|
||||
After updating, verify your cluster is healthy:
|
||||
|
||||
```bash
|
||||
# Check all nodes are ready
|
||||
kubectl get nodes
|
||||
|
||||
# Check pod status
|
||||
kubectl get pods --all-namespaces
|
||||
|
||||
# Check cluster info
|
||||
kubectl cluster-info
|
||||
|
||||
# View recent events
|
||||
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
#### Rollback (if needed)
|
||||
|
||||
If an update causes issues, you can rollback to a previous version:
|
||||
|
||||
```bash
|
||||
# Update inventory with previous version
|
||||
# [k3s_cluster:vars]
|
||||
# k3s_version=v1.34.2+k3s1
|
||||
|
||||
# Re-run the playbook
|
||||
ansible-playbook site.yml --tags k3s-server,k3s-agent
|
||||
```
|
||||
|
||||
### Rebooting Cluster Nodes
|
||||
|
||||
A dedicated playbook is provided to safely reboot all cluster nodes:
|
||||
|
||||
238
influxdb/rpi-cluster-dashboard-v2.json
Normal file
238
influxdb/rpi-cluster-dashboard-v2.json
Normal file
@@ -0,0 +1,238 @@
|
||||
{
|
||||
"name": "Raspberry Pi K3s Cluster Metrics",
|
||||
"description": "System monitoring dashboard for Raspberry Pi K3s cluster with Telegraf metrics",
|
||||
"cells": [
|
||||
{
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"w": 6,
|
||||
"h": 4,
|
||||
"kind": "Gauge",
|
||||
"name": "CPU Usage - Average",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -15m)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> mean()",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [
|
||||
{
|
||||
"id": "0",
|
||||
"type": "background",
|
||||
"hex": "#00C9FF",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "1",
|
||||
"type": "background",
|
||||
"hex": "#FFB94E",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"id": "2",
|
||||
"type": "background",
|
||||
"hex": "#FF3D3D",
|
||||
"value": 80
|
||||
}
|
||||
],
|
||||
"prefix": "",
|
||||
"suffix": "%",
|
||||
"decimalPlaces": 1,
|
||||
"note": ""
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 6,
|
||||
"y": 0,
|
||||
"w": 6,
|
||||
"h": 4,
|
||||
"kind": "Gauge",
|
||||
"name": "Memory Usage - Average",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -15m)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> mean()",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [
|
||||
{
|
||||
"id": "0",
|
||||
"type": "background",
|
||||
"hex": "#00C9FF",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "1",
|
||||
"type": "background",
|
||||
"hex": "#FFB94E",
|
||||
"value": 60
|
||||
},
|
||||
{
|
||||
"id": "2",
|
||||
"type": "background",
|
||||
"hex": "#FF3D3D",
|
||||
"value": 85
|
||||
}
|
||||
],
|
||||
"prefix": "",
|
||||
"suffix": "%",
|
||||
"decimalPlaces": 1,
|
||||
"note": ""
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 4,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "CPU Usage - All Nodes",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "CPU Usage (%)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"note": ""
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "Memory Usage - All Nodes",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Memory (%)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"note": ""
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "CPU Temperature - All Nodes",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu_temp_thermal\")\n |> filter(fn: (r) => r[\"_field\"] == \"value\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Temperature (°C)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"note": ""
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 16,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "System Load - All Nodes",
|
||||
"properties": {
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"load1\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced"
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Load Average (1m)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"note": ""
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
375
influxdb/rpi-cluster-dashboard.json
Normal file
375
influxdb/rpi-cluster-dashboard.json
Normal file
@@ -0,0 +1,375 @@
|
||||
{
|
||||
"name": "Raspberry Pi K3s Cluster Metrics",
|
||||
"description": "System monitoring dashboard for Raspberry Pi K3s cluster with Telegraf metrics",
|
||||
"org": "family",
|
||||
"cells": [
|
||||
{
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"w": 6,
|
||||
"h": 4,
|
||||
"kind": "Gauge",
|
||||
"name": "CPU Usage - Average",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -1h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> mean()",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [
|
||||
{
|
||||
"id": "base",
|
||||
"type": "text",
|
||||
"hex": "#ffffff",
|
||||
"name": "Crayola",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "0",
|
||||
"type": "background",
|
||||
"hex": "#31C0F6",
|
||||
"name": "Crayola",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "1",
|
||||
"type": "background",
|
||||
"hex": "#A500A5",
|
||||
"name": "Crayola",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"id": "2",
|
||||
"type": "background",
|
||||
"hex": "#FF0000",
|
||||
"name": "Crayola",
|
||||
"value": 80
|
||||
}
|
||||
],
|
||||
"prefix": "",
|
||||
"suffix": "%",
|
||||
"decimalPlaces": 2,
|
||||
"gaugeColors": [
|
||||
{
|
||||
"name": "green",
|
||||
"type": "min",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"name": "yellow",
|
||||
"type": "max",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"name": "red",
|
||||
"type": "max",
|
||||
"value": 100
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 6,
|
||||
"y": 0,
|
||||
"w": 6,
|
||||
"h": 4,
|
||||
"kind": "Gauge",
|
||||
"name": "Memory Usage - Average",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -1h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> mean()",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [
|
||||
{
|
||||
"id": "base",
|
||||
"type": "text",
|
||||
"hex": "#ffffff",
|
||||
"name": "Crayola",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "0",
|
||||
"type": "background",
|
||||
"hex": "#31C0F6",
|
||||
"name": "Crayola",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "1",
|
||||
"type": "background",
|
||||
"hex": "#A500A5",
|
||||
"name": "Crayola",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"id": "2",
|
||||
"type": "background",
|
||||
"hex": "#FF0000",
|
||||
"name": "Crayola",
|
||||
"value": 80
|
||||
}
|
||||
],
|
||||
"prefix": "",
|
||||
"suffix": "%",
|
||||
"decimalPlaces": 1,
|
||||
"gaugeColors": [
|
||||
{
|
||||
"name": "green",
|
||||
"type": "min",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"name": "yellow",
|
||||
"type": "max",
|
||||
"value": 60
|
||||
},
|
||||
{
|
||||
"name": "red",
|
||||
"type": "max",
|
||||
"value": 100
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 4,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "CPU Usage - All Nodes",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu\")\n |> filter(fn: (r) => r[\"_field\"] == \"usage_user\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "CPU Usage (%)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y2": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"colorizeRows": false,
|
||||
"legend": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "Memory Usage - All Nodes",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"mem\")\n |> filter(fn: (r) => r[\"_field\"] == \"used_percent\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Memory (%)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y2": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"colorizeRows": false,
|
||||
"legend": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "CPU Temperature - All Nodes",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"cpu_temp_thermal\")\n |> filter(fn: (r) => r[\"_field\"] == \"value\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Temperature (°C)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y2": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"colorizeRows": false,
|
||||
"legend": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"x": 0,
|
||||
"y": 16,
|
||||
"w": 12,
|
||||
"h": 4,
|
||||
"kind": "TimeSeries",
|
||||
"name": "System Load - All Nodes",
|
||||
"properties": {
|
||||
"shape": "chronograf-v2",
|
||||
"queries": [
|
||||
{
|
||||
"text": "from(bucket: \"rpi-cluster\")\n |> range(start: -6h)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"load1\")\n |> aggregateWindow(every: 1m, fn: mean)",
|
||||
"editMode": "advanced",
|
||||
"name": "",
|
||||
"builderConfig": {
|
||||
"buckets": [],
|
||||
"tags": [],
|
||||
"functions": [],
|
||||
"filters": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"colors": [],
|
||||
"axes": {
|
||||
"x": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y": {
|
||||
"bounds": [],
|
||||
"label": "Load Average (1m)",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
},
|
||||
"y2": {
|
||||
"bounds": [],
|
||||
"label": "",
|
||||
"prefix": "",
|
||||
"suffix": "",
|
||||
"base": "10",
|
||||
"scale": "linear"
|
||||
}
|
||||
},
|
||||
"type": "xy",
|
||||
"geom": "line",
|
||||
"colorizeRows": false,
|
||||
"legend": {}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -17,7 +17,8 @@ worker
|
||||
|
||||
[k3s_cluster:vars]
|
||||
# K3s version to install
|
||||
k3s_version=v1.34.2+k3s1
|
||||
# Use 'latest' for auto-updates, or specify a version like 'v1.29.0+k3s1'
|
||||
k3s_version=v1.35.0+k3s1
|
||||
|
||||
# Network settings
|
||||
ansible_user=pi
|
||||
|
||||
@@ -14,16 +14,16 @@
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
when: not k3s_binary.stat.exists
|
||||
|
||||
- name: Install k3s agent
|
||||
- name: Install or upgrade k3s agent
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION="{{ k3s_version }}" \
|
||||
K3S_URL="{{ k3s_url }}" \
|
||||
K3S_TOKEN="{{ k3s_token }}" \
|
||||
INSTALL_K3S_EXEC="agent {{ extra_agent_args }}" \
|
||||
sh /tmp/k3s-install.sh
|
||||
when: not k3s_binary.stat.exists
|
||||
register: k3s_install_result
|
||||
changed_when: "'installed' in k3s_install_result.stdout or 'upgraded' in k3s_install_result.stdout"
|
||||
|
||||
- name: Wait for k3s agent to be ready
|
||||
wait_for:
|
||||
|
||||
@@ -9,14 +9,14 @@
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
when: not k3s_binary.stat.exists
|
||||
|
||||
- name: Install k3s server
|
||||
- name: Install or upgrade k3s server
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION="{{ k3s_version }}" \
|
||||
INSTALL_K3S_EXEC="server {{ extra_server_args }}" \
|
||||
sh /tmp/k3s-install.sh
|
||||
when: not k3s_binary.stat.exists
|
||||
register: k3s_install_result
|
||||
changed_when: "'installed' in k3s_install_result.stdout or 'upgraded' in k3s_install_result.stdout"
|
||||
|
||||
- name: Wait for k3s to be ready
|
||||
wait_for:
|
||||
|
||||
@@ -16,12 +16,12 @@ BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo -e "${BLUE}╔════════════════════════════════════════════════════════════════╗${NC}"
|
||||
echo -e "${BLUE}║ Compute Blade Agent Verification Script ║${NC}"
|
||||
echo -e "${BLUE}║ Compute Blade Agent Verification Script ║${NC}"
|
||||
echo -e "${BLUE}╚════════════════════════════════════════════════════════════════╝${NC}\n"
|
||||
|
||||
# Parse worker nodes from inventory
|
||||
echo -e "${YELLOW}Parsing worker nodes from inventory...${NC}"
|
||||
WORKERS=$(grep -E "^cb-0[2-9]|^pi-worker" "$INVENTORY" | awk '{print $1}')
|
||||
WORKERS=$(grep -E "^\[worker\]" -A 100 "$INVENTORY" | grep -E "^cm4-|^pi-worker|^cb-0" | grep -v "^\[" | awk '{print $1}')
|
||||
|
||||
if [ -z "$WORKERS" ]; then
|
||||
echo -e "${RED}No worker nodes found in inventory${NC}"
|
||||
|
||||
Reference in New Issue
Block a user