The Kenough Kubernetes Cluster

02 October 2024
proxmox,
kubernetes,
talos

The nodes #

I named them ken1 and ken2 because, despite being outdated mini PCs, they are kenough to be Kubernetes clusters.

Quick hardware overview:

i7-6700T (both)
Ken1: 32GB RAM, 500GB WD SN770 SSD
- Started with 20GB and a no-name SSD. Don't try to make things work with no-name SSDs.
- Runs Talos
Ken2: 16GB RAM, 500GB WD Blue SSD
- Runs k3s

The Cilium Mesh Experiment #

Cilium has a Cluster Mesh feature that allows you to extend the Cilium service mesh on multiple clusters together: https://cilium.io/use-cases/cluster-mesh/. I want to test this feature across a real physical network separation by running a Kubernetes cluster each on the Kens and having apps communicate across the Cluster Mesh. A 1Gb link between two clusters of 4 cores should sorta kinda maybe be representative of a 100Gb link between two production clusters of 400 cores.

Talos Linux on Proxmox #

After some brief attempts with KinD and microk8s on Ubuntu I decided to look into this Talos Linux thing and was immediately very happy.

Reference to follow: Talos on Proxmox

Warning: dense setup notes after this point.

Setup I: Template VM #

Generate a Talos image with the guest-agent extension. Create a VM following the minimum specs:

Role	Memory	Cores	Disk
Control plane	2GB	2	10GB
Worker	1GB	1	10GB

Configuration

Enable guest agent
Disable RAM ballooning
Put the disk ahead of the install CD in the boot order
Convert to template when ready

Setup II: Cluster bootstrap #

Do a full clone of the template and boot it. Apply the config for the first control plane node, then set TALOSCONFIG the generated config file and run talosctl config endpoint <IP> and talosctl config node <IP>.

Initialize etcd with talosctl bootstrap, then get your kubeconfig with talosctl kubeconfig ..

Setup III: Additional nodes #

Adding additional nodes:

control plane: talosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file _out/controlplane.yaml
worker: talosctl apply-config --insecure --nodes $WORKER_IP --file _gen_out/worker.yaml

I created 3 control plane and 3 worker nodes total for the next step of setting up Cilium.

Cilium on Talos #

Reference: Deploying Cilium CNI on Talos

You need to change the machine config to add cni=none. I reset my VMs and generated a new base config: talosctl gen config talos-proxmox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out_cilium --install-image factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.7.0 --config-patch @nocni_patch.yaml

nocni_patch.yaml:

cluster:
  network:
    cni:
      name: none
  # Disables kube-proxy
  proxy:
    disabled: true

Once the nodes are partially ready (Pods will not be ready as there is no CNI), use Helm to install cilium.

cilium values.yaml for full CNI+LB+Ingress

# helm install cilium cilium/cilium --version 1.15.6 --namespace kube-system -f cilium-values.yaml

ipam:
  mode: kubernetes

kubeProxyReplacement: true # not since 1.14?

l2announcements:
  enabled: true

externalIPs:
  enabled: false

securityContext:
  capabilities:
    ciliumAgent: ["CHOWN","KILL","NET_ADMIN","NET_RAW","IPC_LOCK","SYS_ADMIN","SYS_RESOURCE","DAC_OVERRIDE","FOWNER","SETGID","SETUID"]
    cleanCiliumState: ["NET_ADMIN","SYS_ADMIN","SYS_RESOURCE"]

cgroup:
  autoMount:
    enabled: false
  hostRoot: "/sys/fs/cgroup"


k8sServiceHost: localhost
k8sServicePort: 7445
# Cilium ships with a low rate limit by default that can result in strange issues when it gets rate limited
k8sClientRateLimit:
  qps: 50
  burst: 100

hubble:
  relay:
    enabled: true
  ui:
    enabled: true

hostFirewall:
  enabled: true

ingressController:
  enabled: true
  loadbalancerMode: shared
  default: true

rolloutCiliumPods: true

I installed the Cilium CLI and ran cilium status to confirm Cilium was healthy.

When using Cilium LB, some additional steps are needed to get it to assign IPs for LoadBalancer and Ingress objects:

Enable Ingress support (doc)
Enable L2 Announcements (doc)
Make a CiliumLoadBalancerIPPool (LB IPAM). I used 10 IPs from the non-DHCP range of my server subnet but as few as 1 can work if you only need Ingress support as Ingress objects can share an LB with loadbalancerMode: shared.
Make a CiliumL2AnnouncementPolicy

Useful commands #

Check disks detected by Talos talosctl disks --insecure --nodes <node IP>
If you missed a required patch on a node it can be easier to start from scratch by resetting it (at boot menu or using talosctl) and applying the patches to start with: talosctl apply-config --insecure --nodes $IP --file _out/controlplane.yaml --config-patch @patches/drbd_patch.yaml --config-patch @patches/allowcontrolplaneschedule.yaml
- _out contains the generated files from talosctl gen config earlier
- patches is a local directory containing the patch yaml
Check node dmesg, very useful when debugging extension errors: talosctl dmesg
Open the cluster TUI dashboard: talosctl dashboard
Remove old control plane member from etcd
- Sometimes see hang on etcd waiting to start up due to an old control plane node hanging around in the etcd members
- Get member IDs: talosctl etcd members
- Remove old node by ID: talosctl etcd remove-member <id>
Reset a node: talosctl reset -n <IP>
View/edit node machineconfig: talosctl edit machineconfig -n <IP>

Metrics and Grafana Cloud #

Talos metrics

https://www.talos.dev/v1.7/kubernetes-guides/configuration/deploy-metrics-server/
https://mirceanton.com/posts/2023-11-28-the-best-os-for-kubernetes/#creating-the-talos-configuration-file:~:text=such%20as%20the-,metrics%2Dserver,-.
Enable rotating certificates, apply manifests

Grafana Cloud has a very reasonable free tier that I signed up for. There's a dedicated Kubernetes section that will generate a preconfigured Helm install for you to remote write metrics to Grafana Cloud with premade dashboards available.

Storage without busy looping half the CPU #

I wanted to have dynamically allocated PVs that are accessible from any node. They'd all use the same SSD but in the future it would scale to multiple disks / multiple nodes.

The first few storage options attempted did not work out:

I got Mayastor working but it was using a huge amount of CPU since apparently a lot of distributed storage systems have polling loops for maximum performance when you have lots of nodes with lots of cores instead of the reverse of 6 nodes on 4 actual cores.
OpenEBS almost worked but couldn't detect the built in nvme-tcp kernel module
Longhorn V2 apparently also has the busy loop issue
As does Ceph, or at least really isn't suited for small clusters

Enter Piraeus Linstor

No busy loop, generally resource efficient
Can use a subset of nodes in your cluster
Can explicitly control the number of replicas made of a PV, allowing less important PVs to have less or no replicas
Piraeus operator manages Linstor for you

Here's what it looks like once you have some working pools with storage assigned to PVs:

Piraeus Linstor on Talos Setup Overview #

Full post here:

Get some nodes ready with an additional disk for Piraeus use
- General list of available storage options https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/reference/linstorsatelliteconfiguration.md#example-3
- Take the node offline in Proxmox, add a disk, start it again
Use the Piraeus Github docs as they seem to be the most up to date, refer to the Talos docs for Talos specific changes
- For Linstor things: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/
kubectl krew install linstor to use later
Upgrade the Talos nodes to enable DRBD
Apply the DRBD patch to the nodes
- Can enable DRBD and apply patch to entire cluster for simplicity
Kustomize or Helm install the Piraeus operator (use ns piraeus or piraeus-datastore)
- Need to enable installCRDs
Create a LinstorCluster object
Create the Talos-customized LinstorSatelliteConfiguration
Create a LinstorSatelliteConfiguration for the storage you want to use (eg /dev/sdb), the type (LVM, LVM thin, etc) and what names to use
- use nodeSelector or affinity to make sure it only applies to nodes with that storage
Create a storage class referencing the storage pool and specifying the number of replicas with autoPlace
- Can create multiple such as making one for single replica and another for 3-replica
Create a PVC to test
- using a storage class with Immediate binding makes this much easier

Talos patch examples #

Patch collection

Allow scheduling on control plane

cluster:
    allowSchedulingOnControlPlanes: true

DRBD patch for Piraeus Linstor

# https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/talos.md
machine:
  kernel:
    modules:
      - name: drbd
        parameters:
          - usermode_helper=disabled
      - name: drbd_transport_tcp

Metrics patch

# https://www.talos.dev/v1.7/kubernetes-guides/configuration/deploy-metrics-server/
# Needed on all nodes
# also,
# kubectl apply -f https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
machine:
  kubelet:
    extraArgs:
      rotate-server-certificates: true

No CNI patch for installing your own CNI

cluster:
  network:
    cni:
      name: none
  # Disables kube-proxy
  proxy:
    disabled: true

Mayastor patch

- op: add
  path: /machine/sysctls
  value:
    vm.nr_hugepages: "1024"
- op: add
  path: /machine/nodeLabels
  value:
    openebs.io/engine: mayastor
- op: add
  path: /machine/kubelet/extraMounts
  value:
    # for etcd
    - destination: /var/local/mayastor/localpv-hostpath/etcd # Destination is the absolute path where the mount will be placed in the container.
      type: bind # Type specifies the mount kind.
      source: /var/local/mayastor/localpv-hostpath/etcd # Source specifies the source path of the mount.
      # Options are fstab style mount options.
      options:
        - bind
        - rshared
        - rw

    # for loki
    - destination: /var/local/mayastor/localpv-hostpath/loki # Destination is the absolute path where the mount will be placed in the container.
      type: bind # Type specifies the mount kind.
      source: /var/local/mayastor/localpv-hostpath/loki # Source specifies the source path of the mount.
      # Options are fstab style mount options.
      options:
        - bind
        - rshared
        - rw

OpenEBS patch

# talosctl patch --mode=no-reboot machineconfig -n IP --patch @openebs_new_patch.yaml
machine:
  kubelet:
      # # The `extraMounts` field is used to add additional mounts to the kubelet container.
      extraMounts:

      # for etcd
          - destination: /var/local/openebs/localpv-hostpath/etcd # Destination is the absolute path where the mount will be placed in the container.
            type: bind # Type specifies the mount kind.
            source: /var/local/openebs/localpv-hostpath/etcd # Source specifies the source path of the mount.
      #       # Options are fstab style mount options.
            options:
              - bind
              - rshared
              - rw

            # for loki
          - destination: /var/local/openebs/localpv-hostpath/loki # Destination is the absolute path where the mount will be placed in the container.
            type: bind # Type specifies the mount kind.
            source: /var/local/openebs/localpv-hostpath/loki # Source specifies the source path of the mount.
      #       # Options are fstab style mount options.
            options:
              - bind
              - rshared
              - rw

Random notes on various Talos things #

Talos mayastor:

Talos storage options provided: Cloud if available, Ceph (for large installations), Mayastor (simpler/leaner)
https://www.talos.dev/v1.7/kubernetes-guides/configuration/storage/
https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor
Guide from the maya side: https://openebs.io/docs/main/user-guides/replicated-storage-user-guide/replicated-pv-mayastor/openebs-on-kubernetes-platforms/talos
Mayastor
- Need to write a talos patch to enable hugepages and enable the mayastor engine
- Using gen config: talosctl gen config my-cluster https://mycluster.local:6443 --config-patch @mayastor-patch.yaml
- Patch an existing node: talosctl patch --mode=no-reboot machineconfig -n <node ip> --patch @mayastor-patch.yaml
- Install mayastor https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor
privileged label on mayastor ns? kubectl label namespace mayastor pod-security.kubernetes.io/enforce=privileged
Couldn't get all the maya pods up
Would need to make a Mayastor DiskPool and StorageClass after?
TALOS INSTRUCTIONS OUT OF DATE FOR MAYASTOR>2.5 https://github.com/openebs/openebs/issues/2767
With namespace privileges, directory mounted, and 4GB RAM on the mayastor workers it finally came up with >80% host CPU use
Polling based IO means it will always cause high CPU use https://mayastor.gitbook.io/introduction/troubleshooting/faqs

OpenEBS new instructions https://openebs.io/docs/quickstart-guide/installation

Need a patch on workers (adjust path mayastor->openebs) https://github.com/openebs/openebs/issues/2767
and privileged ns kubectl label namespace openebs pod-security.kubernetes.io/enforce=privileged
OpenEBS doesn't check for built-in nvme-tcp so can't go online
also expects /home/keys mount and access to /home -> causes kubelet error

Talos Longhorn

https://longhorn.io/docs/1.6.2/advanced-resources/os-distro-specific/talos-linux-support/
Need iscsi-tools and util-linux-tools extensions in the image, privileged longhorn ns, an extraMount specified
Longhorn V2 engine has same idle problem apparently https://github.com/longhorn/longhorn/discussions/8373

Reprovision Talos node with more storage

Resize in Proxmox, seems to be picked up right away Reprovision Talos node with different disk
Reassign storage in Proxmox
Resize disk
Did it offline (seemed OK) and online (node broke after a bit?)

Next: Setting up Piraeus Linstor on Talos