Running K8SGPT with Ollama Inside Your Kubernetes Cluster: A Complete Guide

Introduction
Kubernetes (K8s) has become the backbone of modern cloud-native applications, empowering developers and operators to deploy, manage, and scale workloads efficiently. However, as clusters grow in complexity, troubleshooting issues and optimizing performance become increasingly challenging. This is where K8sGPT steps in — a powerful AI-driven tool designed for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.
K8sGPT, by default, relies on external LLM (Large Language Model) providers for its analysis, which is OpenAI, while effective, can pose concerns around data privacy (though, K8sGPT provides an annonymous mode), latency, and costs. With Ollama, a self-hosted LLM solution, it enables you to keep your data within your cluster and process it locally. By integrating Ollama with K8sGPT, you can achieve a robust and secure solution for Kubernetes troubleshooting while reducing reliance on external services.
This tutorial will guide you through deploying K8sGPT in your Kubernetes cluster and configuring it to use an Ollama pod as its LLM backend. Additionally, we’ll touch on how you can leverage Prometheus metrics from K8sGPT for consolidated insights across multi-cluster environments.
Prerequisites
Before you start, ensure you have the following:
- A Running Kubernetes Cluster
Ensure your cluster is running Kubernetes version 1.23 or higher (recommended) and is properly configured. - Kubernetes CLI (kubectl)
Thekubectl
command-line tool must be installed and configured to interact with your cluster. - Helm Installed
Helm is required to install and manage dependencies such as K8sGPT. for production set up, you can convert to any tools of your preference. - GPU-Enabled Nodes (Recommended for Large Models)
- If you are running large language models in Ollama, ensure your cluster has at least one GPU-enabled node. This provides significant performance benefits by accelerating inference.
- Nodes with GPUs should be labeled appropriately (e.g.,
nvidia.com/gpu
) to allow workload scheduling. - If GPU nodes are unavailable, you can still run Ollama on CPU-only nodes, but be aware of potential latency and performance trade-offs, especially for larger models.
Step 1: Deploying Ollama in Your Kubernetes Cluster
The first step is to deploy an Ollama pod in your cluster, which will serve as the local LLM backend for K8sGPT
1.1 Deploy Ollama with Helm
helm upgrade ollama ollama-helm/ollama --namespace ollama --install --create-namespace --values ollama-values.yaml
The value file (use nodeSelector and tolerations if applicable that matches your GPU node configuration), and in this case, I use llama3 model.
tolerations:
- effect: NoSchedule
key: gpu #use what's applicable in your configuration
operator: Exists
nodeSelector:
gpu: "true" #use what's applicable in your configuration
ollama:
gpu:
# -- Enable GPU integration
enabled: true
# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'
# -- Specify the number of GPU to 1
number: 1
# -- List of models to pull at container startup
models:
pull:
- llama3
Once the ollama pod is running, you should see the logs
time=2024-12-05T15:14:04.112Z level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-12-05T15:14:04.112Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-12-05T15:14:04.112Z level=INFO source=routes.go:1248 msg="Listening on [::]:11434 (version 0.4.7)"
time=2024-12-05T15:14:04.113Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 cpu]"
time=2024-12-05T15:14:04.113Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
time=2024-12-05T15:14:04.229Z level=INFO source=types.go:123 msg="inference compute" id=GPU-0b7e0191-2ebb-64e6-8597-3d118579171a library=cuda variant=v12 compute=7.5 driver=12.4 name="Tesla T4" total="14.7 GiB" available="14.6 GiB"
[GIN] 2024/12/05 - 15:14:04 | 200 | 30.819µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/12/05 - 15:14:04 | 200 | 115.91µs | 127.0.0.1 | GET "/api/ps"
[GIN] 2024/12/05 - 15:14:04 | 200 | 19.893µs | 127.0.0.1 | HEAD "/"
time=2024-12-05T15:14:05.979Z level=INFO source=download.go:175 msg="downloading 6a0746a1ec1a in 16 291 MB part(s)"
time=2024-12-05T15:14:26.063Z level=INFO source=download.go:175 msg="downloading 4fa551d4f938 in 1 12 KB part(s)"
time=2024-12-05T15:14:28.228Z level=INFO source=download.go:175 msg="downloading 8ab4849b038c in 1 254 B part(s)"
time=2024-12-05T15:14:30.437Z level=INFO source=download.go:175 msg="downloading 577073ffcc6c in 1 110 B part(s)"
time=2024-12-05T15:14:32.590Z level=INFO source=download.go:175 msg="downloading 3f8eb4da87fa in 1 485 B part(s)"
[GIN] 2024/12/05 - 15:14:47 | 200 | 43.451446104s | 127.0.0.1 | POST "/api/pull"
[GIN] 2024/12/05 - 15:14:48 | 200 | 21.451µs | 10.0.180.172 | GET "/"
[GIN] 2024/12/05 - 15:14:51 | 200 | 20.455µs | 10.0.180.172 | GET "/"
Step 2: Deploying K8sGPT and Configuring it to Use Ollama
K8SGPT itself can be run as a command line to analyse the cluster, but to use it within the cluster, the project comes with an operator that allows you to create a custom resource that defines the behaviour and scope of a managed K8sGPT workload. Analysis and outputs will also be configurable to enable integration into existing workflows.
K8sGPT requires minor configuration to route its requests to Ollama instead of an external LLM.
2.1 Installing K8sGPT
You can install K8sGPT via Helm for easier management:
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
verify that the operator is running
2024-12-05T06:05:25Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2024-12-05T06:05:25Z INFO setup starting manager
2024-12-05T06:05:25Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-12-05T06:05:25Z INFO starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I1205 06:05:25.920254 1 leaderelection.go:250] attempting to acquire leader lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai...
I1205 06:05:25.924953 1 leaderelection.go:260] successfully acquired lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai
2024-12-05T06:05:25Z DEBUG events release-k8sgpt-operator-controller-manager-6588fdcdc9-znj7z_40ebe326-cd97-4cc0-8cba-91e5651c4fb4 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"k8sgpt-operator-system","name":"ea9c19f7.k8sgpt.ai","uid":"5750a95c-5d60-444c-b314-0aa7236629bf","apiVersion":"coordination.k8s.io/v1","resourceVersion":"2998"}, "reason": "LeaderElection"}
2024-12-05T06:05:25Z INFO Starting EventSource {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2024-12-05T06:05:25Z INFO Starting Controller {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2024-12-05T06:05:26Z INFO Starting workers {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}
2.2 Configuring K8sGPT to Use Ollama
To integrate Ollama with K8sGPT, you’ll need to create a K8SGPT resource that is configured to use llama3 as a model and backend as localai.
Here’s an example of the configuration:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-ollama
namespace: k8sgpt-operator-system
spec:
ai:
enabled: true
model: llama3
backend: localai
baseUrl: http://ollama.ollama.svc.cluster.local.:11434/v1 #internal service
noCache: false
# filters: ["Pod"]
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
k describe k8sgpt k8sgpt-ollama
Name: k8sgpt-ollama
Namespace: k8sgpt-operator-system
API Version: core.k8sgpt.ai/v1alpha1
Kind: K8sGPT
Spec:
Ai:
Anonymized: true
Back Off:
Enabled: false
Max Retries: 5
Backend: localai
Base URL: http://ollama.ollama.svc.cluster.local.:11434/v1
Enabled: true
Language: english
Max Tokens: 2048
Model: llama3
Topk: 50
Repository: ghcr.io/k8sgpt-ai/k8sgpt
Version: v0.3.41
Events: <none>
You should also see a new pod running
NAME READY STATUS RESTARTS AGE
k8sgpt-ollama-66d48ccb5d-vdtwk 1/1 Running 0 3m14s
Once k8sgpt-ollama is running, it should produce the “analysis result” in a form of Result object for each type of analysis (Pod, Service, StatefulSet, etc) along with the solution provided by the localai.
kubectl get result -A
NAMESPACE NAME KIND BACKEND AGE
k8sgpt-operator-system argocdargocdapplicationcontroller StatefulSet localai 22s
kubectl get results argocdargocdapplicationcontroller -o yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
labels:
k8sgpts.k8sgpt.ai/backend: localai
k8sgpts.k8sgpt.ai/name: k8sgpt-ollama
k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
name: argocdargocdapplicationcontroller
namespace: k8sgpt-operator-system
spec:
backend: localai
details: |-
Error: StatefulSet uses a non-existent service.
Solution:
1. Check the name of the service used in the StatefulSet YAML file.
2. Verify that the service exists in your Kubernetes cluster using `kubectl get svc <service_name>`.
3. If the service does not exist, create it using `kubectl create -f service.yaml` (replace with your service YAML file).
4. Update the StatefulSet YAML file to use the correct service name.
5. Apply the updated YAML file using `kubectl apply -f statefulset.yaml`.
text: StatefulSet uses the service argo-cd-argocd-application-controller
which does not exist.
kind: StatefulSet
parentObject: ""
status:
lifecycle: historical
Now, you have all the result that waits for you to fix following the recommended solution!
Step 4: Leveraging Prometheus Metrics for Multi-Cluster Analysis
K8sGPT exposes Prometheus-compatible metrics, enabling you to centralize insights of the operator and also the result. The interesting metrics include
Reconciliation Metrics:
controller_runtime_reconcile_time_seconds_sum
andcontroller_runtime_reconcile_time_seconds_count
indicate that reconciliations might be taking significant timecontroller_runtime_reconcile_total{result="requeue_after"}
might suggest that reconciliations are not immediately successful and are being retried.- Workqueue Behavior:
workqueue_longest_running_processor_seconds
might show a long-running work item. Coupled with workqueue_unfinished_work_seconds
, this indicates potential bottlenecks or stuck threads.
workqueue_retries_total
might signify transient issues with processing.
- AI Backend Issues:
k8sgpt_number_of_failed_backend_ai_calls
can show intermittent issues with the AI backend localai
. This could be network-related or due to backend performance. it’s worth to check the logs of the operator and the ollama pod to see what might be the cause.
- Custom metrics
k8sgpt_number_of_results
k8sgpt_number_of_results_by_type
are very useful when you want to monitor the result of different clusters and the trends of the issues and triages.
4.1 Configuring Prometheus
To scrape K8sGPT metrics, you can enable this flag in the K8sGPT operator
serviceMonitor.enabled=`true`
serviceMonitor.additionalLabels=`match with your prometheus scrape`
Multiple-clusters monitoring
Expanding from a single-cluster setup, K8SGPT introduces the capability to monitor multiple Kubernetes clusters, providing powerful tools for managing fleets of clusters across various environments. This feature allows you to monitor the result of all the clusters in the fleet with efficiency.
How Multi-Cluster Monitoring Works
The operator allows you to monitor multiple clusters by specifying a kubeconfig for each target cluster. Here’s how this works:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-ollama
namespace: k8sgpt-operator-system
spec:
ai:
enabled: true
model: llama3
backend: localai
baseUrl: http://ollama.ollama.svc.cluster.local.:11434/v1 #internal service
noCache: false
# filters: ["Pod"]
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
kubeconfig:
key: value
name: cluster-1-kubeconfig
- The
kubeconfig
section points to the Secret containing the kubeconfig for the target cluster.
Once applied, the K8SGPT Operator uses the provided kubeconfig to interact with the remote cluster, processing results in the management cluster.
Benefits of Multi-Cluster Monitoring
1. Centralized Management
- Monitor all clusters from a single management plane, reducing operational complexity and improving visibility.
2. Scalability
- Support large-scale environments with a fleet of clusters managed through Cluster API.
- Dynamic provisioning and integration of new clusters streamline growth.
3. Enhanced Security
- Use least-privilege kubeconfigs to restrict permissions for monitoring operations, enhancing compliance with organizational policies.
- Avoid storing AI backend credentials in the target clusters, keeping sensitive information centralized and secure.
4. Resource Efficiency
- Offload monitoring and analysis to the management cluster, preserving compute resources in the seed clusters for application workloads.
5. Ease of Filtering and Querying Results
- Use labels like
k8sgpts.k8sgpt.ai/name
andk8sgpts.k8sgpt.ai/backend
to filter results based on specific clusters or AI backends.
Summary
By running K8sGPT inside your Kubernetes cluster with Ollama as a local LLM backend, you achieve:
- Enhanced Privacy: Data remains within your cluster, eliminating concerns about sensitive information leaving your environment.
- Reduced Latency: Local processing eliminates the delay associated with external API calls.
- Cost Savings: Avoid paying for external LLM API usage.
Additionally, leveraging Prometheus metrics from K8sGPT allows us to consolidate insights of the results and the operational metrics of the operator.
Ready to take your Kubernetes troubleshooting to the next level?