Add automatic TLS via Let's Encrypt Cloudflare DNS-01 and Vaultwarden

- Introduce Traefik ACME configuration using Cloudflare DNS-01 challenge
- Deploy Vaultwarden password manager with IP allowlist protection
- Add middleware for security headers, compression, and rate limiting
- Update IngressRoute resources to use new ACME resolver
- Add troubleshooting steps for certificate and TLS issues
- Include test application deployment and verification commands
This commit is contained in:
2026-03-25 11:21:01 +01:00
parent ff78968b74
commit 14d4f2528d
9 changed files with 629 additions and 24 deletions

106
README.md
View File

@@ -6,7 +6,7 @@
- **🔄 3-node HA control plane** with automatic failover
- **📊 Comprehensive monitoring** (Telegraf → InfluxDB → Grafana)
- **🌐 Traefik ingress** with SSL support
- **🌐 Traefik ingress** with automatic TLS via Let's Encrypt + Cloudflare DNS-01
- **🖥️ Compute Blade Agent** for hardware monitoring
- **📈 Prometheus metrics** with custom dashboards
- **🔧 One-command deployment** and maintenance
@@ -36,7 +36,9 @@ k3s-ansible/
│ ├── 📁 k3s-deploy-test/ # Test deployment
│ ├── 📁 compute-blade-agent/ # Hardware monitoring
│ ├── 📁 prometheus-operator/ # Monitoring stack
── 📁 telegraf/ # Metrics collection
── 📁 telegraf/ # Metrics collection
│ ├── 📁 traefik-config/ # Traefik ACME/TLS configuration
│ └── 📁 vaultwarden/ # Vaultwarden password manager
├── 📁 grafana/ # Grafana dashboards
├── 📁 influxdb/ # InfluxDB dashboards
└── 📄 telegraf.yml # Metrics deployment
@@ -74,14 +76,24 @@ Create a `.env` file in the repository root with your credentials:
```bash
cat > .env << EOF
# InfluxDB / Telegraf metrics
INFLUXDB_HOST=192.168.10.10
INFLUXDB_PORT=8086
INFLUXDB_ORG=family
INFLUXDB_BUCKET=rpi-cluster
INFLUXDB_TOKEN=your-api-token-here
INFLUXDB_TOKEN=your-influxdb-api-token-here
# Traefik ACME / Let's Encrypt via Cloudflare DNS-01
ACME_EMAIL=you@yourdomain.com
CF_DNS_API_TOKEN=your-cloudflare-api-token-here
# Vaultwarden
ADMIN_TOKEN=your-vaultwarden-admin-token-here
EOF
```
**Cloudflare API Token requirements**: The token must have **Zone → DNS → Edit** permission scoped to the DNS zones you want to issue certificates for. Create one at Cloudflare dashboard → My Profile → API Tokens → Create Token → Edit zone DNS (template).
**⚠️ Security Note:** This file is ignored by Git (`.gitignore`) and should never be committed. Keep actual tokens secure and only on your local machine.
### 4. Test Connectivity
@@ -107,6 +119,12 @@ ansible-playbook site.yml --tags prereq
# Deploy monitoring
ansible-playbook telegraf.yml
# Configure Traefik ACME/TLS only (on already-running cluster)
ansible-playbook site.yml --tags traefik-config
# Deploy Vaultwarden only
ansible-playbook site.yml --tags vaultwarden
# Deploy test application only
ansible-playbook site.yml --tags deploy-test
@@ -227,21 +245,65 @@ kubectl get nodes
## 🌐 Ingress & Networking
### Traefik Ingress Controller
**✅ Pre-installed** and ready to use!
**✅ Pre-installed** by K3s and configured for automatic TLS.
**How it works:**
- 🎯 Listens on ports 80 (HTTP) & 443 (HTTPS)
- 🔄 Routes traffic by hostname
- 📦 Multiple apps share same IP via different domains
- ⚡ Zero additional configuration needed
- Listens on port 80 (HTTP) and 443 (HTTPS)
- Routes traffic by hostname to the correct service
- Multiple apps share the same IP via different domains
- HTTP traffic is automatically redirected to HTTPS
**Verify Traefik:**
```bash
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik
kubectl get svc -n kube-system traefik
kubectl get ingress
kubectl get ingress --all-namespaces
```
### TLS Certificates — Let's Encrypt via Cloudflare DNS-01
Certificates are issued automatically by **Traefik's built-in ACME client** using a **DNS-01 challenge** through the Cloudflare API. No cert-manager is required.
**How it works:**
1. When an Ingress with `certresolver: letsencrypt-cloudflare` is deployed, Traefik requests a certificate from Let's Encrypt.
2. Traefik creates a `_acme-challenge.<domain>` TXT record via the Cloudflare API to prove domain ownership.
3. Let's Encrypt validates the record and issues the certificate.
4. Traefik stores the certificate in `/data/acme.json` (on a PVC) and auto-renews it before expiry.
**The `traefik-config` role** (`roles/traefik-config/`) provisions this by:
- Creating a `traefik-cloudflare-token` Kubernetes Secret in `kube-system` from `.env`
- Applying a `HelmChartConfig` CRD that patches the K3s-bundled Traefik Helm release with the ACME resolver and Cloudflare provider configuration
**Deploy or re-apply the configuration:**
```bash
ansible-playbook site.yml --tags traefik-config
```
**Annotate an Ingress to use automatic TLS:**
```yaml
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt-cloudflare
```
**Check certificate status:**
```bash
# View ACME storage (cert state)
kubectl exec -n kube-system deploy/traefik -- cat /data/acme.json | jq '.["letsencrypt-cloudflare"].Certificates[].domain'
# Check Traefik logs for ACME activity
kubectl logs -n kube-system deploy/traefik | grep -i acme
```
**Switch to Let's Encrypt staging** (to avoid rate limits during testing):
Edit `roles/traefik-config/defaults/main.yml`:
```yaml
traefik_acme_server: https://acme-staging-v02.api.letsencrypt.org/directory
```
Then re-run `ansible-playbook site.yml --tags traefik-config`.
## 🧪 Test Your Cluster
### Automated Test Deployment
@@ -444,6 +506,32 @@ sudo journalctl -u k3s-agent -f
/usr/local/bin/k3s-agent-uninstall.sh
```
### TLS / Certificate Issues
**Certificate not issued (stays at self-signed):**
```bash
# Check Traefik logs for ACME errors
kubectl logs -n kube-system deploy/traefik | grep -iE "acme|error|cloudflare"
# Verify the Cloudflare secret exists
kubectl get secret traefik-cloudflare-token -n kube-system
# Verify the HelmChartConfig was applied
kubectl get helmchartconfig traefik -n kube-system
```
**Cloudflare API token errors:**
- Confirm the token has **Zone → DNS → Edit** permission for the relevant zone.
- Confirm the token is correctly set in `.env` (no trailing whitespace or newlines).
- Re-run `ansible-playbook site.yml --tags traefik-config` after correcting the token.
**Let's Encrypt rate limit hit:**
- Switch to the staging server in `roles/traefik-config/defaults/main.yml` (`traefik_acme_server`), re-run the role, verify the flow works, then switch back to production and delete `acme.json` to force re-issuance:
```bash
kubectl exec -n kube-system deploy/traefik -- rm /data/acme.json
kubectl rollout restart deploy/traefik -n kube-system
```
### Common Issues
- 🔥 **Nodes not joining**: Check firewall (port 6443)
- 💾 **Memory issues**: Verify cgroup memory enabled

View File

@@ -228,22 +228,20 @@ spec:
selector:
app: nginx-test
---
apiVersion: networking.k8s.io/v1
kind: Ingress
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: nginx-test
namespace: default
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: test.zlor.fi
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-test
port:
number: 80
entryPoints:
- websecure
routes:
- match: Host(`test.zlor.fi`)
kind: Rule
priority: 100
services:
- name: nginx-test
port: 80
tls:
certResolver: letsencrypt-cloudflare

View File

@@ -0,0 +1,115 @@
# Traefik Middleware CRDs
# Kubernetes equivalent of the Traefik file-provider dynamic middleware config.
# Each middleware is a separate traefik.io/v1alpha1 Middleware object.
# Reference in IngressRoute: middlewares: [{ name: <name>, namespace: traefik-system }]
# Docs: https://doc.traefik.io/traefik/middlewares/overview/
---
apiVersion: v1
kind: Namespace
metadata:
name: traefik-system
# ── Security Headers ──────────────────────────────────────────────────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: security-headers
namespace: traefik-system
spec:
headers:
frameDeny: true
contentTypeNosniff: true
browserXssFilter: true
forceSTSHeader: true
stsIncludeSubdomains: true
stsPreload: true
stsSeconds: 31536000
customFrameOptionsValue: "SAMEORIGIN"
# ── Compression ───────────────────────────────────────────────────────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: compression
namespace: traefik-system
spec:
compress: {}
# ── Rate Limiting: Web / Chat / WebUI (lenient, allows WebSocket) ─────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit-web
namespace: traefik-system
spec:
rateLimit:
average: 100
burst: 200
period: 1s
sourceCriterion:
ipStrategy:
depth: 1
# ── Rate Limiting: Auth endpoints (strict, brute-force protection) ────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit-auth
namespace: traefik-system
spec:
rateLimit:
average: 3
burst: 6
period: 1s
sourceCriterion:
ipStrategy:
depth: 1
# ── IP Allow List: Dashboard ──────────────────────────────────────────────────
# NOTE: Traefik only evaluates one ipAllowList middleware per route.
# Do not chain multiple ipAllowList middlewares on the same IngressRoute.
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: dashboard-allow-list
namespace: traefik-system
spec:
ipAllowList:
sourceRange:
- "127.0.0.1" # localhost
- "192.168.1.9" # MikroTik router
- "192.168.10.0/24" # Home network
# ── IP Allow List: Local network ──────────────────────────────────────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: local-ip-allow-list
namespace: traefik-system
spec:
ipAllowList:
sourceRange:
- "192.168.1.9"
- "192.168.3.1/28"
- "192.168.10.0/24"
# ── IP Allow List: Local + IoT network ───────────────────────────────────────
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: local-iot-ip-allow-list
namespace: traefik-system
spec:
ipAllowList:
sourceRange:
- "192.168.1.9"
- "192.168.3.1/28"
- "192.168.10.0/24"
- "192.168.20.0/24"

View File

@@ -0,0 +1,177 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: vaultwarden
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vaultwarden-data
namespace: vaultwarden
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: vaultwarden
namespace: vaultwarden
labels:
app: vaultwarden
spec:
replicas: 1
selector:
matchLabels:
app: vaultwarden
strategy:
type: Recreate
template:
metadata:
labels:
app: vaultwarden
spec:
containers:
- name: vaultwarden
image: vaultwarden/server:latest
ports:
- containerPort: 80
name: http
env:
- name: DOMAIN
value: "https://safe.zlor.fi"
- name: SIGNUPS_ALLOWED
value: "false"
- name: EMERGENCY_ACCESS_ALLOWED
value: "true"
- name: EXTENDED_LOGGING
value: "true"
- name: HTTPS_ONLY
value: "true"
- name: WEB_VAULT_ENABLED
value: "true"
- name: LOG_FILE
value: "/data/vaultwarden.log"
- name: ADMIN_TOKEN
valueFrom:
secretKeyRef:
name: vaultwarden-secret
key: ADMIN_TOKEN
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
periodSeconds: 15
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
volumes:
- name: data
persistentVolumeClaim:
claimName: vaultwarden-data
---
apiVersion: v1
kind: Service
metadata:
name: vaultwarden
namespace: vaultwarden
labels:
app: vaultwarden
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: vaultwarden
---
# Middleware: restrict /admin to specific IP ranges
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: vault-admin-ip-whitelist
namespace: vaultwarden
spec:
ipAllowList:
sourceRange:
- "62.143.153.106"
- "192.168.10.64"
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: vaultwarden
namespace: vaultwarden
spec:
entryPoints:
- websecure
routes:
- match: Host(`safe.zlor.fi`) && PathPrefix(`/`)
kind: Rule
priority: 100
middlewares:
- name: security-headers
namespace: traefik-system
- name: compression
namespace: traefik-system
- name: rate-limit-web
namespace: traefik-system
services:
- name: vaultwarden
port: 80
tls:
certResolver: letsencrypt-cloudflare
---
# Separate IngressRoute for /admin with IP allowlist middleware
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: vaultwarden-admin
namespace: vaultwarden
spec:
entryPoints:
- websecure
routes:
- match: Host(`safe.zlor.fi`) && PathPrefix(`/admin`)
kind: Rule
priority: 200
middlewares:
- name: vault-admin-ip-whitelist
- name: security-headers
namespace: traefik-system
- name: compression
namespace: traefik-system
- name: rate-limit-web
namespace: traefik-system
services:
- name: vaultwarden
port: 80
tls:
certResolver: letsencrypt-cloudflare

View File

@@ -0,0 +1,22 @@
---
# Traefik ACME / Let's Encrypt configuration via Cloudflare DNS-01 challenge
# Secrets (acme_email, cloudflare_api_token) are read from .env at runtime.
# Name of the ACME certificate resolver — must match the certresolver annotation
# used in Ingress/IngressRoute objects (e.g. vaultwarden-deployment.yaml).
traefik_certresolver_name: letsencrypt-cloudflare
# Let's Encrypt ACME server.
# Use the staging URL while testing to avoid rate-limit hits:
# https://acme-staging-v02.api.letsencrypt.org/directory
traefik_acme_server: https://acme-v02.api.letsencrypt.org/directory
# Path inside the Traefik pod where ACME state (certs, account) is persisted.
traefik_acme_storage: /data/acme.json
# Traefik entrypoint names — must match annotations in ingress manifests.
traefik_entrypoint_web: web # HTTP (port 80)
traefik_entrypoint_websecure: websecure # HTTPS (port 443)
# Redirect all HTTP traffic to HTTPS.
traefik_redirect_http_to_https: true

View File

@@ -0,0 +1,68 @@
---
- name: Read .env file
slurp:
src: '{{ playbook_dir }}/.env'
register: env_file
delegate_to: localhost
become: false
- name: Set Cloudflare and ACME variables from .env
set_fact:
cloudflare_api_token: "{{ (env_file.content | b64decode | regex_search('CF_DNS_API_TOKEN=(.+)$', '\\1', multiline=True) | first) }}"
acme_email: "{{ (env_file.content | b64decode | regex_search('ACME_EMAIL=(.+)$', '\\1', multiline=True) | first) }}"
no_log: true
- name: Create traefik-cloudflare-token secret
shell: |
kubectl create secret generic traefik-cloudflare-token \
--from-literal=CF_DNS_API_TOKEN={{ cloudflare_api_token }} \
--namespace kube-system \
--dry-run=client -o yaml \
--kubeconfig={{ playbook_dir }}/kubeconfig \
| kubectl apply -f - --kubeconfig={{ playbook_dir }}/kubeconfig
no_log: true
delegate_to: localhost
become: false
changed_when: true
- name: Template Traefik HelmChartConfig
template:
src: traefik-helmchartconfig.j2
dest: /tmp/traefik-helmchartconfig.yaml
delegate_to: localhost
become: false
- name: Apply Traefik HelmChartConfig
shell: kubectl apply -f /tmp/traefik-helmchartconfig.yaml --kubeconfig={{ playbook_dir }}/kubeconfig
register: helmchartconfig_result
delegate_to: localhost
become: false
changed_when: "'configured' in helmchartconfig_result.stdout or 'created' in helmchartconfig_result.stdout"
- name: Remove temporary HelmChartConfig file
file:
path: /tmp/traefik-helmchartconfig.yaml
state: absent
delegate_to: localhost
become: false
- name: Wait for Traefik rollout after config change
shell: kubectl rollout status deployment/traefik -n kube-system --kubeconfig={{ playbook_dir }}/kubeconfig --timeout=120s
delegate_to: localhost
become: false
changed_when: false
retries: 3
delay: 10
- name: Display Traefik configuration summary
debug:
msg:
- 'Traefik ACME configuration applied'
- 'Certificate resolver: {{ traefik_certresolver_name }}'
- 'ACME server: {{ traefik_acme_server }}'
- 'ACME storage: {{ traefik_acme_storage }}'
- 'DNS challenge provider: cloudflare'
- 'HTTP->HTTPS redirect: {{ traefik_redirect_http_to_https }}'
- ''
- 'Ingress objects using this resolver must set:'
- ' traefik.ingress.kubernetes.io/router.tls.certresolver: {{ traefik_certresolver_name }}'

View File

@@ -0,0 +1,63 @@
# Managed by Ansible — do not edit manually.
# This HelmChartConfig patches the K3s-bundled Traefik Helm release.
# K3s's helm-controller merges the valuesContent below into the Traefik chart values.
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
# ── Entrypoints ────────────────────────────────────────────────────────────
ports:
web:
port: 8000
exposedPort: 80
protocol: TCP
{% if traefik_redirect_http_to_https %}
redirectTo:
port: {{ traefik_entrypoint_websecure }}
{% endif %}
websecure:
port: 8443
exposedPort: 443
protocol: TCP
tls:
enabled: true
# ── ACME / Let's Encrypt via Cloudflare DNS-01 ─────────────────────────────
additionalArguments:
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.email={{ acme_email }}"
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.storage={{ traefik_acme_storage }}"
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.caserver={{ traefik_acme_server }}"
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.dnschallenge=true"
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.{{ traefik_certresolver_name }}.acme.dnschallenge.resolvers=1.1.1.1:53,8.8.8.8:53"
# ── Cloudflare API token injected as an environment variable ───────────────
env:
- name: CF_DNS_API_TOKEN
valueFrom:
secretKeyRef:
name: traefik-cloudflare-token
key: CF_DNS_API_TOKEN
# ── Persist ACME certificate state across pod restarts ────────────────────
persistence:
enabled: true
name: data
accessMode: ReadWriteOnce
size: 128Mi
path: /data
# ── Allow cross-namespace middleware references ───────────────────────────
# Required for IngressRoute objects in one namespace (e.g. vaultwarden) to
# reference Middleware objects in another namespace (e.g. traefik-system).
providers:
kubernetesCRD:
allowCrossNamespace: true
# ── Expose Traefik dashboard (internal use only) ───────────────────────────
ingressRoute:
dashboard:
enabled: false

View File

@@ -0,0 +1,54 @@
---
- name: Read .env file
slurp:
src: '{{ playbook_dir }}/.env'
register: env_file
delegate_to: localhost
become: false
- name: Set Vaultwarden variables from .env
set_fact:
vaultwarden_admin_token: "{{ (env_file.content | b64decode | regex_search('ADMIN_TOKEN=(.+)$', '\\1', multiline=True) | first) }}"
no_log: true
- name: Create vaultwarden namespace
shell: kubectl create namespace vaultwarden --kubeconfig={{ playbook_dir }}/kubeconfig 2>/dev/null || true
delegate_to: localhost
become: false
changed_when: false
- name: Create vaultwarden-secret from .env
shell: |
kubectl create secret generic vaultwarden-secret \
--from-literal=ADMIN_TOKEN={{ vaultwarden_admin_token }} \
--namespace vaultwarden \
--dry-run=client -o yaml \
--kubeconfig={{ playbook_dir }}/kubeconfig \
| kubectl apply -f - --kubeconfig={{ playbook_dir }}/kubeconfig
no_log: true
delegate_to: localhost
become: false
changed_when: true
- name: Apply vaultwarden manifest
shell: kubectl apply -f {{ playbook_dir }}/manifests/vaultwarden-deployment.yaml --kubeconfig={{ playbook_dir }}/kubeconfig
register: vaultwarden_apply
delegate_to: localhost
become: false
changed_when: "'configured' in vaultwarden_apply.stdout or 'created' in vaultwarden_apply.stdout"
- name: Wait for vaultwarden rollout
shell: kubectl rollout status deployment/vaultwarden -n vaultwarden --kubeconfig={{ playbook_dir }}/kubeconfig --timeout=120s
delegate_to: localhost
become: false
changed_when: false
retries: 3
delay: 10
- name: Display vaultwarden deployment summary
debug:
msg:
- 'Vaultwarden deployed successfully'
- 'URL: https://safe.zlor.fi'
- 'Admin panel: https://safe.zlor.fi/admin'
- 'Admin panel is restricted by IP allowlist (vault-admin-ip-whitelist middleware)'

View File

@@ -49,6 +49,26 @@
- compute-blade-agent
- blade-agent
- name: Configure Traefik (ACME / Let's Encrypt via Cloudflare DNS-01)
hosts: "{{ groups['master'][0] }}"
gather_facts: false
become: false
roles:
- role: traefik-config
tags:
- traefik-config
- traefik
- certs
- name: Deploy Vaultwarden
hosts: "{{ groups['master'][0] }}"
gather_facts: false
become: false
roles:
- role: vaultwarden
tags:
- vaultwarden
- name: Install Prometheus Operator
hosts: "{{ groups['master'][0] }}"
gather_facts: false