Skip to content

Deployment Guide

This guide covers deploying RAG Modulo in various environments, from local development to production.

Table of Contents

Deployment Overview

RAG Modulo is designed for containerized deployment with support for:

  • Local Development: Docker Compose with hot reload
  • Production: Optimized containers with security hardening
  • Cloud Platforms: Kubernetes, Docker Swarm, cloud services
  • CI/CD Integration: Automated deployment pipelines

Prerequisites

System Requirements

  • CPU: 4+ cores recommended
  • RAM: 8GB minimum, 16GB recommended
  • Storage: 50GB+ available space
  • Network: Stable internet connection for AI services

Software Requirements

  • Docker: 20.10+ with Docker Compose 2.0+
  • Make: For deployment automation
  • Git: For code deployment
  • curl: For health checks

External Services

  • IBM WatsonX: For AI/ML capabilities
  • PostgreSQL: Database (included in deployment)
  • Milvus: Vector database (included in deployment)
  • MinIO: Object storage (included in deployment)

Local Deployment

Quick Start

# Clone repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo

# Deploy locally
make run-services

# Verify deployment
curl http://localhost:8000/health

Development Deployment

# Development environment with hot reload
make dev-up

# Services available at:
# - Backend: http://localhost:8000
# - Frontend: http://localhost:3000
# - MLflow: http://localhost:5001

Production-like Local Deployment

# Build production images
make build-all

# Deploy with production settings
make run-ghcr

# Check status
make status

Production Deployment

Environment Setup

1. Server Preparation

# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Add user to docker group
sudo usermod -aG docker $USER

2. Application Deployment

# Clone repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo

# Configure environment
cp env.example .env
# Edit .env with production values

# Deploy
make run-services

# Verify deployment
make health-check

Production Configuration

Environment Variables

# Production settings
PRODUCTION_MODE=true
DEBUG=false
LOG_LEVEL=INFO

# Security
JWT_SECRET_KEY=your-secure-secret-key
SKIP_AUTH=false

# Database
COLLECTIONDB_HOST=postgres
COLLECTIONDB_NAME=rag_modulo_prod
COLLECTIONDB_USER=rag_user
COLLECTIONDB_PASS=secure-password

# AI Services
WATSONX_APIKEY=your-watsonx-api-key
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_INSTANCE_ID=your-instance-id

Security Hardening

# Use production images
make build-all

# Enable security features
export SECURITY_SCAN=true
export VULNERABILITY_CHECK=true

# Run security checks
make security-check

SSL/TLS Configuration

Using Let's Encrypt

# Install Certbot
sudo apt install certbot

# Generate certificates
sudo certbot certonly --standalone -d your-domain.com

# Configure nginx with SSL
# (See nginx configuration examples)

Using Reverse Proxy

# docker-compose.prod.yml
services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/ssl
    depends_on:
      - backend
      - frontend

Cloud Deployment

IBM Cloud Code Engine

IBM Cloud Code Engine provides a fully managed, serverless platform for running containerized applications. This is the recommended deployment method for production workloads.

Complete Application Deployment

The RAG Modulo application consists of multiple components deployed to IBM Cloud Code Engine:

Application Components: - Backend: FastAPI Python application (1 CPU, 4GB RAM) - Frontend: React application (0.5 CPU, 1GB RAM)

Infrastructure Components: - PostgreSQL: Database for application data (0.5 CPU, 2GB RAM) - MinIO: S3-compatible object storage (0.25 CPU, 1GB RAM) - etcd: Key-value store for Milvus (0.25 CPU, 1GB RAM) - Milvus: Vector database for embeddings (0.5 CPU, 2GB RAM)

Quick Start

  1. Prerequisites: Set up IBM Cloud account and configure GitHub secrets
  2. Deploy Complete App: Use GitHub Actions workflow for full-stack deployment
  3. Monitor: Built-in health checks and monitoring for all components
# Deploy complete application via GitHub Actions
# 1. Go to Actions tab in GitHub repository
# 2. Select "Deploy Complete RAG Modulo Application"
# 3. Choose environment (staging/production)
# 4. Click "Run workflow"

Key Features

  • Full-Stack Deployment: Backend + Frontend + Infrastructure in single workflow
  • Complete Infrastructure: PostgreSQL, MinIO, etcd, and Milvus deployed automatically
  • Security-First: Automated vulnerability scanning with Trivy for all images
  • Auto-Scaling: Backend scales 1-5 instances, Frontend scales 1-3 instances
  • Health Monitoring: Built-in health checks and smoke tests for all components
  • Cost Optimization: Pay only for what you use
  • Zero Downtime: Rolling updates with zero downtime
  • Daily Builds: Automated daily builds with optional deployment

Resources

Application Components: - Backend: 1 CPU, 4GB RAM, scales 1-5 instances, port 8000 - Frontend: 0.5 CPU, 1GB RAM, scales 1-3 instances, port 3000

Infrastructure Components: - PostgreSQL: 0.5 CPU, 2GB RAM, scales 1 instance, port 5432 - MinIO: 0.25 CPU, 1GB RAM, scales 1 instance, port 9000 - etcd: 0.25 CPU, 1GB RAM, scales 1 instance, port 2379 - Milvus: 0.5 CPU, 2GB RAM, scales 1 instance, port 19530

For detailed instructions, see IBM Cloud Code Engine Deployment Guide.

AWS Deployment

Using ECS

# Build and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-account.dkr.ecr.us-east-1.amazonaws.com

# Tag and push images
docker tag rag-modulo/backend:latest your-account.dkr.ecr.us-east-1.amazonaws.com/rag-modulo-backend:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/rag-modulo-backend:latest

# Deploy to ECS
aws ecs create-service --cluster rag-modulo-cluster --service-name rag-modulo-service --task-definition rag-modulo-task

Using EKS

# Create EKS cluster
eksctl create cluster --name rag-modulo-cluster --region us-east-1

# Deploy with kubectl
kubectl apply -f k8s/

Google Cloud Deployment

Using GKE

# Create GKE cluster
gcloud container clusters create rag-modulo-cluster --zone us-central1-a

# Deploy application
kubectl apply -f k8s/

Azure Deployment

Using AKS

# Create AKS cluster
az aks create --resource-group rag-modulo-rg --name rag-modulo-cluster --node-count 3

# Deploy application
kubectl apply -f k8s/

Configuration

Application Configuration

Backend Configuration

# backend/core/config.py
class Settings(BaseSettings):
    # Production settings
    production_mode: bool = Field(default=False, env="PRODUCTION_MODE")
    debug: bool = Field(default=False, env="DEBUG")
    log_level: str = Field(default="INFO", env="LOG_LEVEL")

    # Security
    jwt_secret_key: str = Field(env="JWT_SECRET_KEY")
    skip_auth: bool = Field(default=False, env="SKIP_AUTH")

    # Database
    collectiondb_host: str = Field(default="postgres", env="COLLECTIONDB_HOST")
    collectiondb_name: str = Field(default="rag_modulo", env="COLLECTIONDB_NAME")

    # AI Services
    watsonx_apikey: str = Field(env="WATSONX_APIKEY")
    watsonx_url: str = Field(env="WATSONX_URL")

Frontend Configuration

// webui/src/config.js
const config = {
  apiUrl: process.env.REACT_APP_API_URL || 'http://localhost:8000',
  environment: process.env.NODE_ENV || 'development',
  features: {
    analytics: process.env.REACT_APP_ANALYTICS_ENABLED === 'true',
    debug: process.env.REACT_APP_DEBUG === 'true'
  }
};

Database Configuration

PostgreSQL Setup

-- Create production database
CREATE DATABASE rag_modulo_prod;
CREATE USER rag_user WITH PASSWORD 'secure-password';
GRANT ALL PRIVILEGES ON DATABASE rag_modulo_prod TO rag_user;

Milvus Configuration

# milvus-config.yaml
etcd:
  endpoints:
    - milvus-etcd:2379
  rootPath: by-dev
  metaPath: meta

common:
  security:
    authorizationEnabled: false

Monitoring Configuration

Prometheus Setup

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'rag-modulo-backend'
    static_configs:
      - targets: ['backend:8000']
    metrics_path: '/metrics'

Grafana Dashboard

{
  "dashboard": {
    "title": "RAG Modulo Monitoring",
    "panels": [
      {
        "title": "API Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}

Monitoring

Health Checks

# Application health
curl http://localhost:8000/health

# Database health
curl http://localhost:8000/health/database

# Vector database health
curl http://localhost:8000/health/vector-db

# AI service health
curl http://localhost:8000/health/ai-services

Logging

Log Configuration

# backend/core/logging.py
import logging
import sys

def setup_logging(level: str = "INFO"):
    logging.basicConfig(
        level=getattr(logging, level.upper()),
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.StreamHandler(sys.stdout),
            logging.FileHandler('/app/logs/rag_modulo.log')
        ]
    )

Log Monitoring

# View application logs
docker logs rag_modulo-backend-1

# Follow logs in real-time
docker logs -f rag_modulo-backend-1

# View all service logs
make logs

Metrics

Application Metrics

  • Response Time: API endpoint performance
  • Throughput: Requests per second
  • Error Rate: Failed request percentage
  • Resource Usage: CPU, memory, disk usage

Business Metrics

  • Document Processing: Documents processed per hour
  • Search Performance: Search query response time
  • User Activity: Active users, session duration
  • AI Service Usage: WatsonX API calls, costs

Alerting

Alert Rules

# alerts.yml
groups:
  - name: rag-modulo
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"

Troubleshooting

Common Deployment Issues

Container Won't Start

# Check container logs
docker logs rag_modulo-backend-1

# Check container status
docker ps -a

# Restart container
docker restart rag_modulo-backend-1

Database Connection Issues

# Test database connectivity
docker exec rag_modulo-backend-1 python -c "
from core.config import Settings
from sqlalchemy import create_engine
settings = Settings()
engine = create_engine(settings.database_url)
print('Database connection successful')
"

# Check database logs
docker logs rag_modulo-postgres-1

AI Service Issues

# Test WatsonX connectivity
curl -H "Authorization: Bearer $WATSONX_APIKEY" \
     -H "Content-Type: application/json" \
     "$WATSONX_URL/v1/embeddings" \
     -d '{"input": "test", "model": "sentence-transformers/all-MiniLM-L6-v2"}'

# Check AI service logs
docker logs rag_modulo-backend-1 | grep -i watson

Performance Issues

# Check resource usage
docker stats

# Profile application
make dev-profile

# Check slow queries
docker exec rag_modulo-postgres-1 psql -U rag_user -d rag_modulo -c "
SELECT query, mean_time, calls
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
"

Recovery Procedures

Database Recovery

# Backup database
docker exec rag_modulo-postgres-1 pg_dump -U rag_user rag_modulo > backup.sql

# Restore database
docker exec -i rag_modulo-postgres-1 psql -U rag_user rag_modulo < backup.sql

Application Recovery

# Rollback to previous version
git checkout previous-stable-tag
make build-all
make run-services

# Restart all services
make restart-app

Support

Getting Help

  • Documentation: Check this guide and inline docs
  • Issues: Create a GitHub issue
  • Discussions: Use GitHub Discussions
  • Logs: Always include relevant logs when reporting issues

Emergency Contacts

  • Critical Issues: Create urgent GitHub issue
  • Security Issues: Use private security reporting
  • Performance Issues: Include metrics and logs

Next Steps