diff --git a/README.md b/README.md index bd07732..e066d75 100644 --- a/README.md +++ b/README.md @@ -1072,7 +1072,7 @@ Create a new playbook `dns-config.yml`: --- - name: Configure external DNS resolver hosts: all - become: yes + become: true tasks: - name: Update /etc/resolv.conf with custom DNS copy: diff --git a/UNINSTALL.md b/UNINSTALL.md new file mode 100644 index 0000000..331db52 --- /dev/null +++ b/UNINSTALL.md @@ -0,0 +1,291 @@ +# K3s Uninstall and Fresh Installation Guide + +## Overview + +This guide walks you through completely removing K3s from all nodes in your cluster and performing a clean, fresh installation. + +## WARNING ⚠️ + +**This will:** +- Stop all K3s services +- Delete all Kubernetes data (etcd, volumes, configurations) +- Remove all running containers and pods +- Delete all kubeconfig files +- Clean up systemd service files + +**This will NOT:** +- Delete the actual node's operating system +- Delete non-K3s application data outside of K3s directories +- Delete SSH keys or user home directories + +**Data Loss:** Any persistent data stored in Kubernetes volumes will be lost. Make backups if needed. + +## Prerequisites + +- SSH access to all nodes +- Ansible installed and configured +- Inventory file properly configured +- All nodes running and accessible + +## Step 1: Backup Important Data (Optional but Recommended) + +If you have critical data in the cluster, back it up first: + +```bash +# Backup Prometheus data if installed +kubectl --kubeconfig=./kubeconfig exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -- tar czf /tmp/prometheus-backup.tar.gz -C /prometheus . + +# Extract from pod +kubectl --kubeconfig=./kubeconfig cp monitoring/prometheus-prometheus-kube-prometheus-prometheus-0:/tmp/prometheus-backup.tar.gz ./prometheus-backup.tar.gz +``` + +## Step 2: Uninstall K3s from All Nodes + +Run the uninstall playbook: + +```bash +# Uninstall K3s from all nodes +ansible-playbook uninstall-k3s.yml +``` + +This will: +1. Stop K3s services (k3s and k3s-agent) +2. Run K3s uninstall scripts (/usr/local/bin/k3s-uninstall.sh) +3. Kill any remaining K3s/containerd processes +4. Remove all K3s data directories: + - `/var/lib/rancher/k3s` + - `/etc/rancher/k3s` + - `/var/lib/cni` + - `/var/log/pods` + - `/var/log/containers` + - `/var/run/k3s` + - `/var/cache/k3s` +5. Remove K3s binaries and scripts +6. Remove systemd service files +7. Reload systemd daemon + +## Step 3: Verify Cleanup + +Check that K3s is completely removed: + +```bash +# Verify on a node (via SSH) +ssh pi@192.168.30.101 "ls /var/lib/rancher/ 2>&1 || echo 'Directory removed successfully'" +ssh pi@192.168.30.101 "systemctl list-unit-files | grep k3s || echo 'No k3s services found'" +ssh pi@192.168.30.101 "which k3s || echo 'k3s binary removed'" + +# Verify kubeconfig is gone +ls -la ./kubeconfig && echo "WARNING: kubeconfig still exists" || echo "kubeconfig removed" +``` + +## Step 4: Fresh K3s Installation + +Once all nodes are clean, perform a fresh installation: + +```bash +# Full fresh install +ansible-playbook site.yml --tags k3s-server,k3s-agent + +# Or install just masters +ansible-playbook site.yml --tags k3s-server + +# Or install just workers +ansible-playbook site.yml --tags k3s-agent +``` + +## Step 5: Verify Fresh Installation + +Check that K3s is running correctly: + +```bash +# Check cluster nodes +kubectl --kubeconfig=./kubeconfig get nodes + +# Output should show: +# NAME STATUS ROLES AGE VERSION +# cm4-01 Ready control-plane,etcd,master 0m v1.35.0+k3s1 +# cm4-02 Ready control-plane,etcd 0m v1.35.0+k3s1 +# cm4-03 Ready control-plane,etcd 0m v1.35.0+k3s1 +# cm4-04 Ready 0m v1.35.0+k3s1 + +# Check cluster info +kubectl --kubeconfig=./kubeconfig cluster-info + +# Check system pods +kubectl --kubeconfig=./kubeconfig get pods -n kube-system +``` + +## Step 6: Reinstall Optional Components + +After fresh K3s installation, reinstall optional components: + +```bash +# Install Prometheus Operator +ansible-playbook site.yml --tags prometheus-operator + +# Configure Traefik +ansible-playbook site.yml --tags traefik-config + +# Install compute-blade-agent +ansible-playbook site.yml --tags compute-blade-agent +``` + +Or reinstall everything: + +```bash +# Full fresh installation with all components +ansible-playbook site.yml +``` + +## Selective Uninstall (Optional) + +If you only want to remove specific components: + +### Remove Prometheus Operator Only + +```bash +kubectl --kubeconfig=./kubeconfig delete namespace monitoring +``` + +### Remove Traefik Configuration Only + +```bash +kubectl --kubeconfig=./kubeconfig delete helmchart traefik -n kube-system +kubectl --kubeconfig=./kubeconfig delete helmchart traefik-crd -n kube-system +``` + +### Remove compute-blade-agent Only + +```bash +kubectl --kubeconfig=./kubeconfig delete namespace compute-blade-agent +``` + +### Reset K3s But Keep the Installation + +If you want to keep K3s but reset it: + +```bash +# Stop K3s without uninstalling +ansible all -i inventory/hosts.ini -m systemd -a "name=k3s state=stopped" -b + +# Manually delete specific data while keeping K3s binary: +ansible all -i inventory/hosts.ini -m shell -a "rm -rf /var/lib/rancher/k3s/server/db /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/token" -b + +# Restart +ansible all -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b +``` + +## Troubleshooting + +### Uninstall Script Fails + +If `/usr/local/bin/k3s-uninstall.sh` fails: + +```bash +# Manually on the node: +ssh pi@192.168.30.101 + +# Kill processes +sudo pkill -9 k3s +sudo pkill -9 containerd + +# Remove directories +sudo rm -rf /var/lib/rancher/k3s +sudo rm -rf /etc/rancher/k3s +sudo rm -rf /var/lib/cni +sudo rm -rf /var/run/k3s + +# Remove binaries +sudo rm -f /usr/local/bin/k3s* +sudo rm -f /usr/local/bin/kubectl +sudo rm -f /usr/local/bin/crictl + +# Remove services +sudo rm -f /etc/systemd/system/k3s* +sudo systemctl daemon-reload +``` + +### K3s Won't Start After Fresh Install + +Clear caches and restart: + +```bash +# On affected node +sudo systemctl stop k3s +sudo rm -rf /var/lib/rancher/k3s/agent/containerd +sudo systemctl start k3s + +# Check logs +sudo journalctl -u k3s -n 50 -f +``` + +### Cluster Won't Form After Fresh Install + +Reset etcd and rejoin masters: + +```bash +# Stop all masters +ansible master -i inventory/hosts.ini -m systemd -a "name=k3s state=stopped" -b + +# On primary master only, remove etcd +ssh pi@192.168.30.101 "sudo rm -rf /var/lib/rancher/k3s/server/db" + +# Start primary master first +ansible cm4-01 -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b + +# Wait for it to be ready (30 seconds) +sleep 30 + +# Start additional masters +ansible cm4-02,cm4-03 -i inventory/hosts.ini -m systemd -a "name=k3s state=started" -b +``` + +## Quick Reference Commands + +```bash +# Full uninstall +ansible-playbook uninstall-k3s.yml + +# Fresh install after uninstall +ansible-playbook site.yml + +# Verify cluster +kubectl --kubeconfig=./kubeconfig get nodes + +# View cluster info +kubectl --kubeconfig=./kubeconfig cluster-info + +# Check K3s version +kubectl --kubeconfig=./kubeconfig version +``` + +## FAQ + +**Q: Can I uninstall specific nodes only?** + +A: Yes, use the `--limit` flag: +```bash +ansible-playbook uninstall-k3s.yml --limit cm4-04 +``` + +**Q: How long does uninstall take?** + +A: Usually 2-5 minutes depending on the amount of data to clean up. + +**Q: Can I keep etcd data?** + +A: Manually edit the uninstall playbook and remove the `/var/lib/rancher/k3s` deletion, but this is not recommended for a fresh install. + +**Q: What if nodes are offline?** + +A: Skip them with `--limit` and clean them up manually after they come back online. + +**Q: Do I need to uninstall on workers before masters?** + +A: Order doesn't matter. All nodes can be uninstalled simultaneously. + +## References + +- [K3s Uninstall Documentation](https://docs.k3s.io/installation/uninstall) +- [K3s Data Directory Structure](https://docs.k3s.io/storage) +- [Ansible Documentation](https://docs.ansible.com/) \ No newline at end of file diff --git a/reboot.yml b/reboot.yml index ff2f909..5644568 100644 --- a/reboot.yml +++ b/reboot.yml @@ -1,16 +1,16 @@ --- - name: Reboot k3s worker nodes hosts: worker - become: yes + become: true serial: 1 tasks: - name: Display node being rebooted debug: - msg: "Rebooting worker node: {{ inventory_hostname }} ({{ ansible_host }})" + msg: 'Rebooting worker node: {{ inventory_hostname }} ({{ ansible_host }})' - name: Reboot worker node reboot: - msg: "Reboot initiated by Ansible" + msg: 'Reboot initiated by Ansible' connect_timeout: 5 reboot_timeout: 600 pre_reboot_delay: 0 @@ -33,20 +33,20 @@ - name: Display node ready debug: - msg: "Worker node {{ inventory_hostname }} is back online and k3s-agent is running" + msg: 'Worker node {{ inventory_hostname }} is back online and k3s-agent is running' - name: Reboot k3s master nodes hosts: master - become: yes + become: true serial: 1 tasks: - name: Display node being rebooted debug: - msg: "Rebooting master node: {{ inventory_hostname }} ({{ ansible_host }})" + msg: 'Rebooting master node: {{ inventory_hostname }} ({{ ansible_host }})' - name: Reboot master node reboot: - msg: "Reboot initiated by Ansible" + msg: 'Reboot initiated by Ansible' connect_timeout: 5 reboot_timeout: 600 pre_reboot_delay: 0 @@ -75,7 +75,7 @@ - name: Display node ready debug: - msg: "Master node {{ inventory_hostname }} is back online and k3s is running" + msg: 'Master node {{ inventory_hostname }} is back online and k3s is running' - name: Verify cluster status hosts: master[0] diff --git a/roles/k3s-server/tasks/main.yml b/roles/k3s-server/tasks/main.yml index e76bccf..d6144df 100644 --- a/roles/k3s-server/tasks/main.yml +++ b/roles/k3s-server/tasks/main.yml @@ -40,13 +40,7 @@ - name: Store master node token set_fact: - k3s_node_token: "{{ node_token.content | b64decode | trim }}" - - - name: Add node token to dummy host - add_host: - name: "k3s_token_holder" - token: "{{ k3s_node_token }}" - run_once: true + k3s_node_token: '{{ node_token.content | b64decode | trim }}' when: k3s_server_init | default(false) | bool @@ -60,10 +54,17 @@ delay: 10 timeout: 300 + - name: Read token from primary master + slurp: + src: /var/lib/rancher/k3s/server/node-token + register: primary_node_token + delegate_to: "{{ groups['master'][0] }}" + become: true + - name: Get cluster credentials set_fact: k3s_url: "https://{{ hostvars[groups['master'][0]]['ansible_host'] }}:6443" - k3s_token: "{{ hostvars['k3s_token_holder']['token'] }}" + k3s_token: '{{ primary_node_token.content | b64decode | trim }}' - name: Install k3s on additional master shell: | @@ -84,34 +85,34 @@ # Common tasks for all master nodes - name: Create .kube directory for user file: - path: "/home/{{ ansible_user }}/.kube" + path: '/home/{{ ansible_user }}/.kube' state: directory - owner: "{{ ansible_user }}" - group: "{{ ansible_user }}" + owner: '{{ ansible_user }}' + group: '{{ ansible_user }}' mode: '0755' - name: Copy k3s kubeconfig to user home copy: src: /etc/rancher/k3s/k3s.yaml - dest: "/home/{{ ansible_user }}/.kube/config" - owner: "{{ ansible_user }}" - group: "{{ ansible_user }}" + dest: '/home/{{ ansible_user }}/.kube/config' + owner: '{{ ansible_user }}' + group: '{{ ansible_user }}' mode: '0600' remote_src: yes - name: Replace localhost with master IP in kubeconfig replace: - path: "/home/{{ ansible_user }}/.kube/config" + path: '/home/{{ ansible_user }}/.kube/config' regexp: '127.0.0.1' - replace: "{{ ansible_host }}" + replace: '{{ ansible_host }}' - name: Fetch kubeconfig from primary master only fetch: - src: "/home/{{ ansible_user }}/.kube/config" - dest: "{{ playbook_dir }}/kubeconfig" + src: '/home/{{ ansible_user }}/.kube/config' + dest: '{{ playbook_dir }}/kubeconfig' flat: yes when: k3s_server_init | default(false) | bool - name: Display success message debug: - msg: "K3s server installed successfully on {{ inventory_hostname }}" + msg: 'K3s server installed successfully on {{ inventory_hostname }}' diff --git a/telegraf.yml b/telegraf.yml index 4fc1b85..a3a2c3d 100644 --- a/telegraf.yml +++ b/telegraf.yml @@ -1,13 +1,13 @@ --- - name: Deploy Telegraf to all nodes hosts: all - become: yes + become: true pre_tasks: - name: Parse .env file and set variables block: - name: Read .env file slurp: - src: "{{ playbook_dir }}/.env" + src: '{{ playbook_dir }}/.env' register: env_file delegate_to: localhost become: false diff --git a/uninstall-k3s.yml b/uninstall-k3s.yml new file mode 100644 index 0000000..025e66a --- /dev/null +++ b/uninstall-k3s.yml @@ -0,0 +1,143 @@ +--- +- name: Uninstall K3s from all nodes + hosts: k3s_cluster + become: true + gather_facts: true + + vars: + k3s_data_dir: /var/lib/rancher/k3s + k3s_etc_dir: /etc/rancher/k3s + + tasks: + - name: Stop K3s services + block: + - name: Stop k3s server service (masters) + systemd: + name: k3s + state: stopped + enabled: no + ignore_errors: yes + when: inventory_hostname in groups['master'] + + - name: Stop k3s agent service (workers) + systemd: + name: k3s-agent + state: stopped + enabled: no + ignore_errors: yes + when: inventory_hostname in groups['worker'] + + - name: Run K3s uninstall scripts + block: + - name: Run k3s uninstall script (masters) + shell: | + if [ -f /usr/local/bin/k3s-uninstall.sh ]; then + /usr/local/bin/k3s-uninstall.sh + fi + ignore_errors: yes + when: inventory_hostname in groups['master'] + register: master_uninstall + + - name: Run k3s agent uninstall script (workers) + shell: | + if [ -f /usr/local/bin/k3s-agent-uninstall.sh ]; then + /usr/local/bin/k3s-agent-uninstall.sh + fi + ignore_errors: yes + when: inventory_hostname in groups['worker'] + register: worker_uninstall + + - name: Kill any remaining K3s processes + shell: | + pkill -f "k3s" + pkill -f "containerd" + ignore_errors: yes + + - name: Remove K3s data directories + file: + path: "{{ item }}" + state: absent + loop: + - "{{ k3s_data_dir }}" + - "{{ k3s_etc_dir }}" + - /var/lib/cni + - /var/log/pods + - /var/log/containers + - /var/run/k3s + - /var/cache/k3s + ignore_errors: yes + + - name: Remove K3s binaries and scripts + file: + path: "{{ item }}" + state: absent + loop: + - /usr/local/bin/k3s + - /usr/local/bin/k3s-killall.sh + - /usr/local/bin/k3s-uninstall.sh + - /usr/local/bin/k3s-agent-uninstall.sh + - /usr/local/bin/kubectl + - /usr/local/bin/crictl + - /usr/local/bin/ctr + ignore_errors: yes + + - name: Remove K3s systemd files + file: + path: "{{ item }}" + state: absent + loop: + - /etc/systemd/system/k3s.service + - /etc/systemd/system/k3s-agent.service + - /etc/systemd/system/k3s.service.env + - /etc/systemd/system/k3s.service.d + - /etc/systemd/system/k3s-agent.service.env + - /etc/systemd/system/k3s-agent.service.d + ignore_errors: yes + + - name: Reload systemd daemon + systemd: + daemon_reload: yes + ignore_errors: yes + + - name: Remove kubeconfig from user home + file: + path: "/home/{{ ansible_user }}/.kube" + state: absent + ignore_errors: yes + + - name: Display uninstall status for {{ inventory_hostname }} + debug: + msg: + - "K3s uninstall completed on {{ inventory_hostname }}" + - "Data directories removed" + - "Services stopped and disabled" + - "Systemd files cleaned up" + +- name: Clean up local kubeconfig + hosts: localhost + gather_facts: false + tasks: + - name: Remove kubeconfig from playbook directory + file: + path: "{{ playbook_dir }}/kubeconfig" + state: absent + ignore_errors: yes + + - name: Display cleanup completion + debug: + msg: + - "" + - "╔═══════════════════════════════════════════════════════════╗" + - "║ K3s Uninstall Complete! ║" + - "╚═══════════════════════════════════════════════════════════╝" + - "" + - "All nodes have been cleaned:" + - " ✓ K3s services stopped" + - " ✓ Uninstall scripts executed" + - " ✓ Data directories removed" + - " ✓ Binaries cleaned up" + - " ✓ Systemd files removed" + - "" + - "Ready for fresh K3s installation:" + - " ansible-playbook site.yml --tags k3s-server,k3s-agent" + - ""