Skip to content

Kubernetes Deployment Guide

This guide covers deploying RAG Modulo to Kubernetes clusters, including deployment manifests, Helm charts, scaling strategies, and production best practices.

Table of Contents

Overview

RAG Modulo on Kubernetes consists of:

  • Backend Deployment: FastAPI application (StatefulSet or Deployment)
  • Frontend Deployment: React/Nginx static server
  • PostgreSQL StatefulSet: Primary database
  • Milvus StatefulSet: Vector database with etcd dependency
  • MinIO StatefulSet: Object storage
  • MLFlow Deployment: Model tracking server

Architecture Pattern: Microservices with shared infrastructure services

Prerequisites

Required Tools

# Kubernetes cluster (v1.24+)
kubectl version --client

# Helm (v3.0+)
helm version

# Optional: k9s for cluster management
k9s version

Cluster Requirements

  • Kubernetes: 1.24+ (tested on 1.28)
  • Nodes: 3+ nodes recommended for HA
  • CPU: 8+ cores total
  • Memory: 16GB+ total
  • Storage: Dynamic provisioning with StorageClass

Docker Images

# Backend
ghcr.io/manavgup/rag_modulo/backend:latest

# Frontend
ghcr.io/manavgup/rag_modulo/frontend:latest

Kubernetes Manifests

Namespace

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: rag-modulo
  labels:
    name: rag-modulo
    environment: production

Secrets

# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: rag-modulo-secrets
  namespace: rag-modulo
type: Opaque
stringData:
  # Database
  postgres-user: postgres
  postgres-password: CHANGE_ME_SECURE_PASSWORD
  postgres-db: rag_modulo_db

  # JWT
  jwt-secret-key: CHANGE_ME_MIN_32_CHARS_SECURE_JWT_SECRET

  # MinIO
  minio-root-user: minioadmin
  minio-root-password: CHANGE_ME_SECURE_PASSWORD

  # MLFlow
  mlflow-username: mlflow
  mlflow-password: CHANGE_ME_SECURE_PASSWORD

  # LLM Providers
  watsonx-api-key: YOUR_WATSONX_API_KEY
  watsonx-url: https://us-south.ml.cloud.ibm.com
  watsonx-instance-id: YOUR_INSTANCE_ID

  # Optional: OpenAI, Anthropic
  openai-api-key: YOUR_OPENAI_KEY
  anthropic-api-key: YOUR_ANTHROPIC_KEY

Create from file:

# DO NOT commit secrets to git!
kubectl apply -f secrets.yaml

# Or create from .env file
kubectl create secret generic rag-modulo-secrets \
  --from-env-file=.env \
  --namespace=rag-modulo

ConfigMap

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: rag-modulo-config
  namespace: rag-modulo
data:
  # Application
  ENVIRONMENT: "production"
  SKIP_AUTH: "false"
  DEVELOPMENT_MODE: "false"
  TESTING: "false"

  # Database
  COLLECTIONDB_HOST: "postgres-service"
  COLLECTIONDB_PORT: "5432"

  # Vector Database
  VECTOR_DB: "milvus"
  MILVUS_HOST: "milvus-service"
  MILVUS_PORT: "19530"

  # Object Storage
  MINIO_HOST: "minio-service"
  MINIO_PORT: "9000"

  # MLFlow
  MLFLOW_TRACKING_URI: "http://mlflow-service:5000"

  # Frontend
  FRONTEND_URL: "https://rag-modulo.example.com"
  BACKEND_URL: "https://api.rag-modulo.example.com"

PostgreSQL StatefulSet

# postgres-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: rag-modulo
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432
  clusterIP: None  # Headless service
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: rag-modulo
spec:
  serviceName: postgres-service
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
          name: postgres
        env:
        - name: POSTGRES_DB
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: postgres-db
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: postgres-user
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: postgres-password
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: standard  # Change to your StorageClass
      resources:
        requests:
          storage: 50Gi

Milvus StatefulSet

# milvus-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: milvus-etcd
  namespace: rag-modulo
spec:
  selector:
    app: milvus-etcd
  ports:
    - port: 2379
      targetPort: 2379
  clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: milvus-etcd
  namespace: rag-modulo
spec:
  serviceName: milvus-etcd
  replicas: 1
  selector:
    matchLabels:
      app: milvus-etcd
  template:
    metadata:
      labels:
        app: milvus-etcd
    spec:
      containers:
      - name: etcd
        image: quay.io/coreos/etcd:v3.5.9
        env:
        - name: ETCD_NAME
          value: "etcd"
        - name: ETCD_DATA_DIR
          value: "/etcd-data"
        - name: ETCD_LISTEN_CLIENT_URLS
          value: "http://0.0.0.0:2379"
        - name: ETCD_ADVERTISE_CLIENT_URLS
          value: "http://milvus-etcd:2379"
        volumeMounts:
        - name: etcd-data
          mountPath: /etcd-data
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
  volumeClaimTemplates:
  - metadata:
      name: etcd-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: milvus-service
  namespace: rag-modulo
spec:
  selector:
    app: milvus
  ports:
    - port: 19530
      targetPort: 19530
      name: grpc
    - port: 9091
      targetPort: 9091
      name: metrics
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: milvus
  namespace: rag-modulo
spec:
  serviceName: milvus-service
  replicas: 1
  selector:
    matchLabels:
      app: milvus
  template:
    metadata:
      labels:
        app: milvus
    spec:
      containers:
      - name: milvus
        image: milvusdb/milvus:v2.4.4
        command: ["milvus", "run", "standalone"]
        ports:
        - containerPort: 19530
          name: grpc
        - containerPort: 9091
          name: metrics
        env:
        - name: ETCD_ENDPOINTS
          value: "milvus-etcd:2379"
        - name: MINIO_ADDRESS
          value: "minio-service:9000"
        - name: MINIO_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: minio-root-user
        - name: MINIO_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: minio-root-password
        volumeMounts:
        - name: milvus-data
          mountPath: /var/lib/milvus
        resources:
          requests:
            memory: "4Gi"
            cpu: "1000m"
          limits:
            memory: "8Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9091
          initialDelaySeconds: 60
          periodSeconds: 30
  volumeClaimTemplates:
  - metadata:
      name: milvus-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

Backend Deployment

# backend-deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: backend-service
  namespace: rag-modulo
spec:
  selector:
    app: backend
  ports:
    - port: 8000
      targetPort: 8000
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: rag-modulo
spec:
  replicas: 3  # Scale based on load
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
        version: v1
    spec:
      containers:
      - name: backend
        image: ghcr.io/manavgup/rag_modulo/backend:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
          name: http
        env:
        # ConfigMap values
        - name: ENVIRONMENT
          valueFrom:
            configMapKeyRef:
              name: rag-modulo-config
              key: ENVIRONMENT
        - name: COLLECTIONDB_HOST
          valueFrom:
            configMapKeyRef:
              name: rag-modulo-config
              key: COLLECTIONDB_HOST
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: rag-modulo-config
              key: MILVUS_HOST
        # Secret values
        - name: COLLECTIONDB_USER
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: postgres-user
        - name: COLLECTIONDB_PASS
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: postgres-password
        - name: JWT_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: jwt-secret-key
        - name: WATSONX_APIKEY
          valueFrom:
            secretKeyRef:
              name: rag-modulo-secrets
              key: watsonx-api-key
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /api/health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 30
          timeoutSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

Frontend Deployment

# frontend-deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
  namespace: rag-modulo
spec:
  selector:
    app: frontend
  ports:
    - port: 8080
      targetPort: 8080
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: rag-modulo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: ghcr.io/manavgup/rag_modulo/frontend:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
        env:
        - name: REACT_APP_BACKEND_URL
          value: "https://api.rag-modulo.example.com"
        - name: REACT_APP_WS_URL
          value: "wss://api.rag-modulo.example.com/ws"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Helm Chart Deployment

Directory Structure

rag-modulo-helm/
โ”œโ”€โ”€ Chart.yaml
โ”œโ”€โ”€ values.yaml
โ”œโ”€โ”€ values-production.yaml
โ”œโ”€โ”€ templates/
โ”‚   โ”œโ”€โ”€ namespace.yaml
โ”‚   โ”œโ”€โ”€ secrets.yaml
โ”‚   โ”œโ”€โ”€ configmap.yaml
โ”‚   โ”œโ”€โ”€ postgres-statefulset.yaml
โ”‚   โ”œโ”€โ”€ milvus-statefulset.yaml
โ”‚   โ”œโ”€โ”€ minio-statefulset.yaml
โ”‚   โ”œโ”€โ”€ backend-deployment.yaml
โ”‚   โ”œโ”€โ”€ frontend-deployment.yaml
โ”‚   โ”œโ”€โ”€ ingress.yaml
โ”‚   โ””โ”€โ”€ NOTES.txt
โ””โ”€โ”€ README.md

Chart.yaml

apiVersion: v2
name: rag-modulo
description: RAG Modulo - Production-ready RAG platform
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
  - rag
  - ai
  - llm
  - vector-database
maintainers:
  - name: RAG Modulo Team

values.yaml (excerpt)

# Global settings
global:
  namespace: rag-modulo
  environment: production

# Backend
backend:
  replicaCount: 3
  image:
    repository: ghcr.io/manavgup/rag_modulo/backend
    tag: latest
    pullPolicy: Always
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"
    limits:
      memory: "4Gi"
      cpu: "2000m"
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

# PostgreSQL
postgres:
  enabled: true
  image: postgres:13
  storage: 50Gi
  storageClass: standard

# Milvus
milvus:
  enabled: true
  image: milvusdb/milvus:v2.4.4
  storage: 100Gi
  resources:
    requests:
      memory: "4Gi"
      cpu: "1000m"

# Ingress
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.rag-modulo.example.com
      paths:
        - path: /
          pathType: Prefix
          service: backend-service
  tls:
    - secretName: rag-modulo-tls
      hosts:
        - api.rag-modulo.example.com

Deploy with Helm

# Add repository (if published to Helm repo)
helm repo add rag-modulo https://charts.rag-modulo.com
helm repo update

# Install from local chart
helm install rag-modulo ./rag-modulo-helm \
  --namespace rag-modulo \
  --create-namespace \
  --values values-production.yaml

# Upgrade deployment
helm upgrade rag-modulo ./rag-modulo-helm \
  --namespace rag-modulo \
  --values values-production.yaml

# Rollback to previous version
helm rollback rag-modulo 1 --namespace rag-modulo

# Uninstall
helm uninstall rag-modulo --namespace rag-modulo

Scaling & High Availability

Horizontal Pod Autoscaling (HPA)

# backend-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
  namespace: rag-modulo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

Pod Disruption Budget

# backend-pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: backend-pdb
  namespace: rag-modulo
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: backend

Multi-Region Deployment

For global availability, deploy to multiple regions:

# Region 1: us-east-1
kubectl --context=us-east-1 apply -f manifests/

# Region 2: eu-west-1
kubectl --context=eu-west-1 apply -f manifests/

# Use global load balancer (AWS Route53, Cloudflare, etc.)

Storage & Persistence

StorageClass for Different Providers

# AWS EBS (gp3)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
volumeBindingMode: WaitForFirstConsumer

# GCP Persistent Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
volumeBindingMode: WaitForFirstConsumer

# Azure Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
volumeBindingMode: WaitForFirstConsumer

Backup Strategy

# Use Velero for cluster backups
velero install \
  --provider aws \
  --bucket rag-modulo-backups \
  --backup-location-config region=us-east-1

# Schedule daily backups
velero schedule create rag-modulo-daily \
  --schedule="0 2 * * *" \
  --include-namespaces=rag-modulo

# Restore from backup
velero restore create --from-backup rag-modulo-daily-20250109

Networking & Ingress

Ingress with NGINX

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rag-modulo-ingress
  namespace: rag-modulo
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.rag-modulo.example.com
    - rag-modulo.example.com
    secretName: rag-modulo-tls
  rules:
  - host: api.rag-modulo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 8000
  - host: rag-modulo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 8080

Resource Management

Resource Quotas

# resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: rag-modulo-quota
  namespace: rag-modulo
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    persistentvolumeclaims: "10"

LimitRange

# limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: rag-modulo-limits
  namespace: rag-modulo
spec:
  limits:
  - max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "100m"
      memory: "128Mi"
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "250m"
      memory: "256Mi"
    type: Container

Troubleshooting

Check Pod Status

# List all pods
kubectl get pods -n rag-modulo

# Describe problematic pod
kubectl describe pod backend-xxx -n rag-modulo

# View logs
kubectl logs -f backend-xxx -n rag-modulo

# Previous container logs (after crash)
kubectl logs backend-xxx -n rag-modulo --previous

Debug Pod Issues

# Run interactive shell in pod
kubectl exec -it backend-xxx -n rag-modulo -- /bin/bash

# Test database connectivity
kubectl exec -it backend-xxx -n rag-modulo -- python -c "
from rag_solution.file_management.database import engine
print(engine.url)
"

# Check environment variables
kubectl exec backend-xxx -n rag-modulo -- env | grep COLLECTION

Network Troubleshooting

# Test service DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -n rag-modulo -- nslookup postgres-service

# Test service connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -n rag-modulo -- curl http://backend-service:8000/api/health