Running K8SGPT with Ollama Inside Your Kubernetes Cluster: A Complete Guide

Published in

NonTechCompany

8 min readDec 5, 2024

Introduction

Kubernetes (K8s) has become the backbone of modern cloud-native applications, empowering developers and operators to deploy, manage, and scale workloads efficiently. However, as clusters grow in complexity, troubleshooting issues and optimizing performance become increasingly challenging. This is where K8sGPT steps in — a powerful AI-driven tool designed for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.

K8sGPT, by default, relies on external LLM (Large Language Model) providers for its analysis, which is OpenAI, while effective, can pose concerns around data privacy (though, K8sGPT provides an annonymous mode), latency, and costs. With Ollama, a self-hosted LLM solution, it enables you to keep your data within your cluster and process it locally. By integrating Ollama with K8sGPT, you can achieve a robust and secure solution for Kubernetes troubleshooting while reducing reliance on external services.

This tutorial will guide you through deploying K8sGPT in your Kubernetes cluster and configuring it to use an Ollama pod as its LLM backend. Additionally, we’ll touch on how you can leverage Prometheus metrics from K8sGPT for consolidated insights across multi-cluster environments.

Prerequisites

Before you start, ensure you have the following:

A Running Kubernetes Cluster
Ensure your cluster is running Kubernetes version 1.23 or higher (recommended) and is properly configured.
Kubernetes CLI (kubectl)
The kubectl command-line tool must be installed and configured to interact with your cluster.
Helm Installed
Helm is required to install and manage dependencies such as K8sGPT. for production set up, you can convert to any tools of your preference.
GPU-Enabled Nodes (Recommended for Large Models)

If you are running large language models in Ollama, ensure your cluster has at least one GPU-enabled node. This provides significant performance benefits by accelerating inference.
Nodes with GPUs should be labeled appropriately (e.g., nvidia.com/gpu) to allow workload scheduling.
If GPU nodes are unavailable, you can still run Ollama on CPU-only nodes, but be aware of potential latency and performance trade-offs, especially for larger models.

Step 1: Deploying Ollama in Your Kubernetes Cluster

The first step is to deploy an Ollama pod in your cluster, which will serve as the local LLM backend for K8sGPT

1.1 Deploy Ollama with Helm

helm upgrade ollama ollama-helm/ollama --namespace ollama  --install --create-namespace --values ollama-values.yaml

The value file (use nodeSelector and tolerations if applicable that matches your GPU node configuration), and in this case, I use llama3 model.

tolerations:
- effect: NoSchedule
  key: gpu #use what's applicable in your configuration
  operator: Exists
nodeSelector:
  gpu: "true" #use what's applicable in your configuration
ollama:
  gpu:
    # -- Enable GPU integration
    enabled: true
    # -- GPU type: 'nvidia' or 'amd'
    type: 'nvidia'
    # -- Specify the number of GPU to 1
    number: 1
  # -- List of models to pull at container startup
  models:
    pull:
      - llama3

Once the ollama pod is running, you should see the logs


time=2024-12-05T15:14:04.112Z level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-12-05T15:14:04.112Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-12-05T15:14:04.112Z level=INFO source=routes.go:1248 msg="Listening on [::]:11434 (version 0.4.7)"
time=2024-12-05T15:14:04.113Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 cpu]"
time=2024-12-05T15:14:04.113Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-05T15:14:04.229Z level=INFO source=types.go:123 msg="inference compute" id=GPU-0b7e0191-2ebb-64e6-8597-3d118579171a library=cuda variant=v12 compute=7.5 driver=12.4 name="Tesla T4" total="14.7 GiB" available="14.6 GiB"
[GIN] 2024/12/05 - 15:14:04 | 200 |      30.819µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/12/05 - 15:14:04 | 200 |      115.91µs |       127.0.0.1 | GET      "/api/ps"
[GIN] 2024/12/05 - 15:14:04 | 200 |      19.893µs |       127.0.0.1 | HEAD     "/"
time=2024-12-05T15:14:05.979Z level=INFO source=download.go:175 msg="downloading 6a0746a1ec1a in 16 291 MB part(s)"
time=2024-12-05T15:14:26.063Z level=INFO source=download.go:175 msg="downloading 4fa551d4f938 in 1 12 KB part(s)"
time=2024-12-05T15:14:28.228Z level=INFO source=download.go:175 msg="downloading 8ab4849b038c in 1 254 B part(s)"
time=2024-12-05T15:14:30.437Z level=INFO source=download.go:175 msg="downloading 577073ffcc6c in 1 110 B part(s)"
time=2024-12-05T15:14:32.590Z level=INFO source=download.go:175 msg="downloading 3f8eb4da87fa in 1 485 B part(s)"
[GIN] 2024/12/05 - 15:14:47 | 200 | 43.451446104s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/12/05 - 15:14:48 | 200 |      21.451µs |    10.0.180.172 | GET      "/"
[GIN] 2024/12/05 - 15:14:51 | 200 |      20.455µs |    10.0.180.172 | GET      "/"

Step 2: Deploying K8sGPT and Configuring it to Use Ollama

K8SGPT itself can be run as a command line to analyse the cluster, but to use it within the cluster, the project comes with an operator that allows you to create a custom resource that defines the behaviour and scope of a managed K8sGPT workload. Analysis and outputs will also be configurable to enable integration into existing workflows.

K8sGPT requires minor configuration to route its requests to Ollama instead of an external LLM.

2.1 Installing K8sGPT

You can install K8sGPT via Helm for easier management:

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace

verify that the operator is running

2024-12-05T06:05:25Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2024-12-05T06:05:25Z INFO setup starting manager
2024-12-05T06:05:25Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-12-05T06:05:25Z INFO starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I1205 06:05:25.920254       1 leaderelection.go:250] attempting to acquire leader lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai...
I1205 06:05:25.924953       1 leaderelection.go:260] successfully acquired lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai
2024-12-05T06:05:25Z DEBUG events release-k8sgpt-operator-controller-manager-6588fdcdc9-znj7z_40ebe326-cd97-4cc0-8cba-91e5651c4fb4 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"k8sgpt-operator-system","name":"ea9c19f7.k8sgpt.ai","uid":"5750a95c-5d60-444c-b314-0aa7236629bf","apiVersion":"coordination.k8s.io/v1","resourceVersion":"2998"}, "reason": "LeaderElection"}
2024-12-05T06:05:25Z INFO Starting EventSource {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2024-12-05T06:05:25Z INFO Starting Controller {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2024-12-05T06:05:26Z INFO Starting workers {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}

2.2 Configuring K8sGPT to Use Ollama

To integrate Ollama with K8sGPT, you’ll need to create a K8SGPT resource that is configured to use llama3 as a model and backend as localai.

Here’s an example of the configuration:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-ollama
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    model: llama3
    backend: localai
    baseUrl: http://ollama.ollama.svc.cluster.local.:11434/v1 #internal service
  noCache: false
  # filters: ["Pod"]
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.41

k describe k8sgpt k8sgpt-ollama
Name:         k8sgpt-ollama
Namespace:    k8sgpt-operator-system
API Version:  core.k8sgpt.ai/v1alpha1
Kind:         K8sGPT
Spec:
  Ai:
    Anonymized:  true
    Back Off:
      Enabled:      false
      Max Retries:  5
    Backend:        localai
    Base URL:       http://ollama.ollama.svc.cluster.local.:11434/v1
    Enabled:        true
    Language:       english
    Max Tokens:     2048
    Model:          llama3
    Topk:           50
  Repository:       ghcr.io/k8sgpt-ai/k8sgpt
  Version:          v0.3.41
Events:             <none>

You should also see a new pod running

NAME                                                          READY   STATUS    RESTARTS   AGE
k8sgpt-ollama-66d48ccb5d-vdtwk                                1/1     Running   0          3m14s

Once k8sgpt-ollama is running, it should produce the “analysis result” in a form of Result object for each type of analysis (Pod, Service, StatefulSet, etc) along with the solution provided by the localai.

kubectl get result -A

NAMESPACE                NAME                                                                  KIND          BACKEND   AGE
k8sgpt-operator-system   argocdargocdapplicationcontroller                                 StatefulSet       localai   22s

kubectl get results argocdargocdapplicationcontroller -o yaml

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  labels:
    k8sgpts.k8sgpt.ai/backend: localai
    k8sgpts.k8sgpt.ai/name: k8sgpt-ollama
    k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
  name: argocdargocdapplicationcontroller
  namespace: k8sgpt-operator-system
spec:
  backend: localai
  details: |-
    Error: StatefulSet uses a non-existent service.

    Solution:

    1. Check the name of the service used in the StatefulSet YAML file.
    2. Verify that the service exists in your Kubernetes cluster using `kubectl get svc <service_name>`.
    3. If the service does not exist, create it using `kubectl create -f service.yaml` (replace with your service YAML file).
    4. Update the StatefulSet YAML file to use the correct service name.
    5. Apply the updated YAML file using `kubectl apply -f statefulset.yaml`.
    text: StatefulSet uses the service argo-cd-argocd-application-controller
      which does not exist.
  kind: StatefulSet
  parentObject: ""
status:
  lifecycle: historical

Now, you have all the result that waits for you to fix following the recommended solution!

Step 4: Leveraging Prometheus Metrics for Multi-Cluster Analysis

K8sGPT exposes Prometheus-compatible metrics, enabling you to centralize insights of the operator and also the result. The interesting metrics include

Reconciliation Metrics:

controller_runtime_reconcile_time_seconds_sum and controller_runtime_reconcile_time_seconds_count indicate that reconciliations might be taking significant time
controller_runtime_reconcile_total{result="requeue_after"} might suggest that reconciliations are not immediately successful and are being retried.
Workqueue Behavior:

workqueue_longest_running_processor_seconds might show a long-running work item. Coupled with workqueue_unfinished_work_seconds, this indicates potential bottlenecks or stuck threads.

workqueue_retries_total might signify transient issues with processing.

AI Backend Issues:

k8sgpt_number_of_failed_backend_ai_calls can show intermittent issues with the AI backend localai. This could be network-related or due to backend performance. it’s worth to check the logs of the operator and the ollama pod to see what might be the cause.

Custom metrics

k8sgpt_number_of_results k8sgpt_number_of_results_by_type

are very useful when you want to monitor the result of different clusters and the trends of the issues and triages.

4.1 Configuring Prometheus

To scrape K8sGPT metrics, you can enable this flag in the K8sGPT operator

serviceMonitor.enabled=`true`
serviceMonitor.additionalLabels=`match with your prometheus scrape`

Multiple-clusters monitoring

Expanding from a single-cluster setup, K8SGPT introduces the capability to monitor multiple Kubernetes clusters, providing powerful tools for managing fleets of clusters across various environments. This feature allows you to monitor the result of all the clusters in the fleet with efficiency.

How Multi-Cluster Monitoring Works

The operator allows you to monitor multiple clusters by specifying a kubeconfig for each target cluster. Here’s how this works:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-ollama
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    model: llama3
    backend: localai
    baseUrl: http://ollama.ollama.svc.cluster.local.:11434/v1 #internal service
  noCache: false
  # filters: ["Pod"]
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.41
  kubeconfig:
    key: value
    name: cluster-1-kubeconfig

The kubeconfig section points to the Secret containing the kubeconfig for the target cluster.

Once applied, the K8SGPT Operator uses the provided kubeconfig to interact with the remote cluster, processing results in the management cluster.

Benefits of Multi-Cluster Monitoring

1. Centralized Management

Monitor all clusters from a single management plane, reducing operational complexity and improving visibility.

2. Scalability

Support large-scale environments with a fleet of clusters managed through Cluster API.
Dynamic provisioning and integration of new clusters streamline growth.

3. Enhanced Security

Use least-privilege kubeconfigs to restrict permissions for monitoring operations, enhancing compliance with organizational policies.
Avoid storing AI backend credentials in the target clusters, keeping sensitive information centralized and secure.

4. Resource Efficiency

Offload monitoring and analysis to the management cluster, preserving compute resources in the seed clusters for application workloads.

5. Ease of Filtering and Querying Results

Use labels like k8sgpts.k8sgpt.ai/name and k8sgpts.k8sgpt.ai/backend to filter results based on specific clusters or AI backends.

Summary

By running K8sGPT inside your Kubernetes cluster with Ollama as a local LLM backend, you achieve:

Enhanced Privacy: Data remains within your cluster, eliminating concerns about sensitive information leaving your environment.
Reduced Latency: Local processing eliminates the delay associated with external API calls.
Cost Savings: Avoid paying for external LLM API usage.

Additionally, leveraging Prometheus metrics from K8sGPT allows us to consolidate insights of the results and the operational metrics of the operator.

Ready to take your Kubernetes troubleshooting to the next level?