Add uninstall-k3s.yml and fix server token flow

This commit is contained in:
2026-01-12 12:04:26 +01:00
parent f3754c01d7
commit 813ee0c252
6 changed files with 465 additions and 30 deletions

View File

@@ -1072,7 +1072,7 @@ Create a new playbook `dns-config.yml`:
---
- name: Configure external DNS resolver
hosts: all
become: yes
become: true
tasks:
- name: Update /etc/resolv.conf with custom DNS
copy:

291
UNINSTALL.md Normal file
View File

@@ -0,0 +1,291 @@
# K3s Uninstall and Fresh Installation Guide
## Overview
This guide walks you through completely removing K3s from all nodes in your cluster and performing a clean, fresh installation.
## WARNING ⚠️
**This will:**
- Stop all K3s services
- Delete all Kubernetes data (etcd, volumes, configurations)
- Remove all running containers and pods
- Delete all kubeconfig files
- Clean up systemd service files
**This will NOT:**
- Delete the actual node's operating system
- Delete non-K3s application data outside of K3s directories
- Delete SSH keys or user home directories
**Data Loss:** Any persistent data stored in Kubernetes volumes will be lost. Make backups if needed.
## Prerequisites
- SSH access to all nodes
- Ansible installed and configured
- Inventory file properly configured
- All nodes running and accessible
## Step 1: Backup Important Data (Optional but Recommended)
If you have critical data in the cluster, back it up first:
```bash
# Backup Prometheus data if installed
kubectl --kubeconfig=./kubeconfig exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -- tar czf /tmp/prometheus-backup.tar.gz -C /prometheus .
# Extract from pod
kubectl --kubeconfig=./kubeconfig cp monitoring/prometheus-prometheus-kube-prometheus-prometheus-0:/tmp/prometheus-backup.tar.gz ./prometheus-backup.tar.gz
```
## Step 2: Uninstall K3s from All Nodes
Run the uninstall playbook:
```bash
# Uninstall K3s from all nodes
ansible-playbook uninstall-k3s.yml
```
This will:
1. Stop K3s services (k3s and k3s-agent)
2. Run K3s uninstall scripts (/usr/local/bin/k3s-uninstall.sh)
3. Kill any remaining K3s/containerd processes
4. Remove all K3s data directories:
- `/var/lib/rancher/k3s`
- `/etc/rancher/k3s`
- `/var/lib/cni`
- `/var/log/pods`
- `/var/log/containers`
- `/var/run/k3s`
- `/var/cache/k3s`
5. Remove K3s binaries and scripts
6. Remove systemd service files
7. Reload systemd daemon
## Step 3: Verify Cleanup
Check that K3s is completely removed:
```bash
# Verify on a node (via SSH)
ssh pi@192.168.30.101 "ls /var/lib/rancher/ 2>&1 || echo 'Directory removed successfully'"
ssh pi@192.168.30.101 "systemctl list-unit-files | grep k3s || echo 'No k3s services found'"
ssh pi@192.168.30.101 "which k3s || echo 'k3s binary removed'"
# Verify kubeconfig is gone
ls -la ./kubeconfig && echo "WARNING: kubeconfig still exists" || echo "kubeconfig removed"
```
## Step 4: Fresh K3s Installation
Once all nodes are clean, perform a fresh installation:
```bash
# Full fresh install
ansible-playbook site.yml --tags k3s-server,k3s-agent
# Or install just masters
ansible-playbook site.yml --tags k3s-server
# Or install just workers
ansible-playbook site.yml --tags k3s-agent
```
## Step 5: Verify Fresh Installation
Check that K3s is running correctly:
```bash
# Check cluster nodes
kubectl --kubeconfig=./kubeconfig get nodes
# Output should show:
# NAME STATUS ROLES AGE VERSION
# cm4-01 Ready control-plane,etcd,master 0m v1.35.0+k3s1
# cm4-02 Ready control-plane,etcd 0m v1.35.0+k3s1
# cm4-03 Ready control-plane,etcd 0m v1.35.0+k3s1
# cm4-04 Ready <none> 0m v1.35.0+k3s1
# Check cluster info
kubectl --kubeconfig=./kubeconfig cluster-info
# Check system pods
kubectl --kubeconfig=./kubeconfig get pods -n kube-system
```
## Step 6: Reinstall Optional Components
After fresh K3s installation, reinstall optional components:
```bash
# Install Prometheus Operator
ansible-playbook site.yml --tags prometheus-operator
# Configure Traefik
ansible-playbook site.yml --tags traefik-config
# Install compute-blade-agent
ansible-playbook site.yml --tags compute-blade-agent
```
Or reinstall everything:
```bash
# Full fresh installation with all components
ansible-playbook site.yml
```
## Selective Uninstall (Optional)
If you only want to remove specific components:
### Remove Prometheus Operator Only
```bash
kubectl --kubeconfig=./kubeconfig delete namespace monitoring
```
### Remove Traefik Configuration Only
```bash
kubectl --kubeconfig=./kubeconfig delete helmchart traefik -n kube-system
kubectl --kubeconfig=./kubeconfig delete helmchart traefik-crd -n kube-system
```
### Remove compute-blade-agent Only
```bash
kubectl --kubeconfig=./kubeconfig delete namespace compute-blade-agent
```
### Reset K3s But Keep the Installation
If you want to keep K3s but reset it:
```bash
# Stop K3s without uninstalling
ansible all -i inventory/hosts.ini -m systemd -a "name=k3s state=stopped" -b
# Manually delete specific data while keeping K3s binary:
ansible all -i inventory/hosts.ini -m shell -a "rm -rf /var/lib/rancher/k3s/server/db /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/token" -b
# Restart
ansible all -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b
```
## Troubleshooting
### Uninstall Script Fails
If `/usr/local/bin/k3s-uninstall.sh` fails:
```bash
# Manually on the node:
ssh pi@192.168.30.101
# Kill processes
sudo pkill -9 k3s
sudo pkill -9 containerd
# Remove directories
sudo rm -rf /var/lib/rancher/k3s
sudo rm -rf /etc/rancher/k3s
sudo rm -rf /var/lib/cni
sudo rm -rf /var/run/k3s
# Remove binaries
sudo rm -f /usr/local/bin/k3s*
sudo rm -f /usr/local/bin/kubectl
sudo rm -f /usr/local/bin/crictl
# Remove services
sudo rm -f /etc/systemd/system/k3s*
sudo systemctl daemon-reload
```
### K3s Won't Start After Fresh Install
Clear caches and restart:
```bash
# On affected node
sudo systemctl stop k3s
sudo rm -rf /var/lib/rancher/k3s/agent/containerd
sudo systemctl start k3s
# Check logs
sudo journalctl -u k3s -n 50 -f
```
### Cluster Won't Form After Fresh Install
Reset etcd and rejoin masters:
```bash
# Stop all masters
ansible master -i inventory/hosts.ini -m systemd -a "name=k3s state=stopped" -b
# On primary master only, remove etcd
ssh pi@192.168.30.101 "sudo rm -rf /var/lib/rancher/k3s/server/db"
# Start primary master first
ansible cm4-01 -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b
# Wait for it to be ready (30 seconds)
sleep 30
# Start additional masters
ansible cm4-02,cm4-03 -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b
```
## Quick Reference Commands
```bash
# Full uninstall
ansible-playbook uninstall-k3s.yml
# Fresh install after uninstall
ansible-playbook site.yml
# Verify cluster
kubectl --kubeconfig=./kubeconfig get nodes
# View cluster info
kubectl --kubeconfig=./kubeconfig cluster-info
# Check K3s version
kubectl --kubeconfig=./kubeconfig version
```
## FAQ
**Q: Can I uninstall specific nodes only?**
A: Yes, use the `--limit` flag:
```bash
ansible-playbook uninstall-k3s.yml --limit cm4-04
```
**Q: How long does uninstall take?**
A: Usually 2-5 minutes depending on the amount of data to clean up.
**Q: Can I keep etcd data?**
A: Manually edit the uninstall playbook and remove the `/var/lib/rancher/k3s` deletion, but this is not recommended for a fresh install.
**Q: What if nodes are offline?**
A: Skip them with `--limit` and clean them up manually after they come back online.
**Q: Do I need to uninstall on workers before masters?**
A: Order doesn't matter. All nodes can be uninstalled simultaneously.
## References
- [K3s Uninstall Documentation](https://docs.k3s.io/installation/uninstall)
- [K3s Data Directory Structure](https://docs.k3s.io/storage)
- [Ansible Documentation](https://docs.ansible.com/)

View File

@@ -1,16 +1,16 @@
---
- name: Reboot k3s worker nodes
hosts: worker
become: yes
become: true
serial: 1
tasks:
- name: Display node being rebooted
debug:
msg: "Rebooting worker node: {{ inventory_hostname }} ({{ ansible_host }})"
msg: 'Rebooting worker node: {{ inventory_hostname }} ({{ ansible_host }})'
- name: Reboot worker node
reboot:
msg: "Reboot initiated by Ansible"
msg: 'Reboot initiated by Ansible'
connect_timeout: 5
reboot_timeout: 600
pre_reboot_delay: 0
@@ -33,20 +33,20 @@
- name: Display node ready
debug:
msg: "Worker node {{ inventory_hostname }} is back online and k3s-agent is running"
msg: 'Worker node {{ inventory_hostname }} is back online and k3s-agent is running'
- name: Reboot k3s master nodes
hosts: master
become: yes
become: true
serial: 1
tasks:
- name: Display node being rebooted
debug:
msg: "Rebooting master node: {{ inventory_hostname }} ({{ ansible_host }})"
msg: 'Rebooting master node: {{ inventory_hostname }} ({{ ansible_host }})'
- name: Reboot master node
reboot:
msg: "Reboot initiated by Ansible"
msg: 'Reboot initiated by Ansible'
connect_timeout: 5
reboot_timeout: 600
pre_reboot_delay: 0
@@ -75,7 +75,7 @@
- name: Display node ready
debug:
msg: "Master node {{ inventory_hostname }} is back online and k3s is running"
msg: 'Master node {{ inventory_hostname }} is back online and k3s is running'
- name: Verify cluster status
hosts: master[0]

View File

@@ -40,13 +40,7 @@
- name: Store master node token
set_fact:
k3s_node_token: "{{ node_token.content | b64decode | trim }}"
- name: Add node token to dummy host
add_host:
name: "k3s_token_holder"
token: "{{ k3s_node_token }}"
run_once: true
k3s_node_token: '{{ node_token.content | b64decode | trim }}'
when: k3s_server_init | default(false) | bool
@@ -60,10 +54,17 @@
delay: 10
timeout: 300
- name: Read token from primary master
slurp:
src: /var/lib/rancher/k3s/server/node-token
register: primary_node_token
delegate_to: "{{ groups['master'][0] }}"
become: true
- name: Get cluster credentials
set_fact:
k3s_url: "https://{{ hostvars[groups['master'][0]]['ansible_host'] }}:6443"
k3s_token: "{{ hostvars['k3s_token_holder']['token'] }}"
k3s_token: '{{ primary_node_token.content | b64decode | trim }}'
- name: Install k3s on additional master
shell: |
@@ -84,34 +85,34 @@
# Common tasks for all master nodes
- name: Create .kube directory for user
file:
path: "/home/{{ ansible_user }}/.kube"
path: '/home/{{ ansible_user }}/.kube'
state: directory
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
owner: '{{ ansible_user }}'
group: '{{ ansible_user }}'
mode: '0755'
- name: Copy k3s kubeconfig to user home
copy:
src: /etc/rancher/k3s/k3s.yaml
dest: "/home/{{ ansible_user }}/.kube/config"
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
dest: '/home/{{ ansible_user }}/.kube/config'
owner: '{{ ansible_user }}'
group: '{{ ansible_user }}'
mode: '0600'
remote_src: yes
- name: Replace localhost with master IP in kubeconfig
replace:
path: "/home/{{ ansible_user }}/.kube/config"
path: '/home/{{ ansible_user }}/.kube/config'
regexp: '127.0.0.1'
replace: "{{ ansible_host }}"
replace: '{{ ansible_host }}'
- name: Fetch kubeconfig from primary master only
fetch:
src: "/home/{{ ansible_user }}/.kube/config"
dest: "{{ playbook_dir }}/kubeconfig"
src: '/home/{{ ansible_user }}/.kube/config'
dest: '{{ playbook_dir }}/kubeconfig'
flat: yes
when: k3s_server_init | default(false) | bool
- name: Display success message
debug:
msg: "K3s server installed successfully on {{ inventory_hostname }}"
msg: 'K3s server installed successfully on {{ inventory_hostname }}'

View File

@@ -1,13 +1,13 @@
---
- name: Deploy Telegraf to all nodes
hosts: all
become: yes
become: true
pre_tasks:
- name: Parse .env file and set variables
block:
- name: Read .env file
slurp:
src: "{{ playbook_dir }}/.env"
src: '{{ playbook_dir }}/.env'
register: env_file
delegate_to: localhost
become: false

143
uninstall-k3s.yml Normal file
View File

@@ -0,0 +1,143 @@
---
- name: Uninstall K3s from all nodes
hosts: k3s_cluster
become: true
gather_facts: true
vars:
k3s_data_dir: /var/lib/rancher/k3s
k3s_etc_dir: /etc/rancher/k3s
tasks:
- name: Stop K3s services
block:
- name: Stop k3s server service (masters)
systemd:
name: k3s
state: stopped
enabled: no
ignore_errors: yes
when: inventory_hostname in groups['master']
- name: Stop k3s agent service (workers)
systemd:
name: k3s-agent
state: stopped
enabled: no
ignore_errors: yes
when: inventory_hostname in groups['worker']
- name: Run K3s uninstall scripts
block:
- name: Run k3s uninstall script (masters)
shell: |
if [ -f /usr/local/bin/k3s-uninstall.sh ]; then
/usr/local/bin/k3s-uninstall.sh
fi
ignore_errors: yes
when: inventory_hostname in groups['master']
register: master_uninstall
- name: Run k3s agent uninstall script (workers)
shell: |
if [ -f /usr/local/bin/k3s-agent-uninstall.sh ]; then
/usr/local/bin/k3s-agent-uninstall.sh
fi
ignore_errors: yes
when: inventory_hostname in groups['worker']
register: worker_uninstall
- name: Kill any remaining K3s processes
shell: |
pkill -f "k3s"
pkill -f "containerd"
ignore_errors: yes
- name: Remove K3s data directories
file:
path: "{{ item }}"
state: absent
loop:
- "{{ k3s_data_dir }}"
- "{{ k3s_etc_dir }}"
- /var/lib/cni
- /var/log/pods
- /var/log/containers
- /var/run/k3s
- /var/cache/k3s
ignore_errors: yes
- name: Remove K3s binaries and scripts
file:
path: "{{ item }}"
state: absent
loop:
- /usr/local/bin/k3s
- /usr/local/bin/k3s-killall.sh
- /usr/local/bin/k3s-uninstall.sh
- /usr/local/bin/k3s-agent-uninstall.sh
- /usr/local/bin/kubectl
- /usr/local/bin/crictl
- /usr/local/bin/ctr
ignore_errors: yes
- name: Remove K3s systemd files
file:
path: "{{ item }}"
state: absent
loop:
- /etc/systemd/system/k3s.service
- /etc/systemd/system/k3s-agent.service
- /etc/systemd/system/k3s.service.env
- /etc/systemd/system/k3s.service.d
- /etc/systemd/system/k3s-agent.service.env
- /etc/systemd/system/k3s-agent.service.d
ignore_errors: yes
- name: Reload systemd daemon
systemd:
daemon_reload: yes
ignore_errors: yes
- name: Remove kubeconfig from user home
file:
path: "/home/{{ ansible_user }}/.kube"
state: absent
ignore_errors: yes
- name: Display uninstall status for {{ inventory_hostname }}
debug:
msg:
- "K3s uninstall completed on {{ inventory_hostname }}"
- "Data directories removed"
- "Services stopped and disabled"
- "Systemd files cleaned up"
- name: Clean up local kubeconfig
hosts: localhost
gather_facts: false
tasks:
- name: Remove kubeconfig from playbook directory
file:
path: "{{ playbook_dir }}/kubeconfig"
state: absent
ignore_errors: yes
- name: Display cleanup completion
debug:
msg:
- ""
- "╔═══════════════════════════════════════════════════════════╗"
- "║ K3s Uninstall Complete! ║"
- "╚═══════════════════════════════════════════════════════════╝"
- ""
- "All nodes have been cleaned:"
- " ✓ K3s services stopped"
- " ✓ Uninstall scripts executed"
- " ✓ Data directories removed"
- " ✓ Binaries cleaned up"
- " ✓ Systemd files removed"
- ""
- "Ready for fresh K3s installation:"
- " ansible-playbook site.yml --tags k3s-server,k3s-agent"
- ""