Deployment Guide¶

This guide covers deploying RAG Modulo in various environments, from local development to production.

Table of Contents¶

Deployment Overview
Prerequisites
Local Deployment
Production Deployment
Cloud Deployment
IBM Cloud Code Engine
AWS Deployment
Google Cloud Deployment
Azure Deployment
Configuration
Monitoring
Troubleshooting

Deployment Overview¶

RAG Modulo is designed for containerized deployment with support for:

Local Development: Docker Compose with hot reload
Production: Optimized containers with security hardening
Cloud Platforms: Kubernetes, Docker Swarm, cloud services
CI/CD Integration: Automated deployment pipelines

Prerequisites¶

System Requirements¶

CPU: 4+ cores recommended
RAM: 8GB minimum, 16GB recommended
Storage: 50GB+ available space
Network: Stable internet connection for AI services

Software Requirements¶

Docker: 20.10+ with Docker Compose 2.0+
Make: For deployment automation
Git: For code deployment
curl: For health checks

External Services¶

IBM WatsonX: For AI/ML capabilities
PostgreSQL: Database (included in deployment)
Milvus: Vector database (included in deployment)
MinIO: Object storage (included in deployment)

Local Deployment¶

Quick Start¶

# Clone repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo

# Deploy locally
make run-services

# Verify deployment
curl http://localhost:8000/health

Development Deployment¶

# Development environment with hot reload
make dev-up

# Services available at:
# - Backend: http://localhost:8000
# - Frontend: http://localhost:3000
# - MLflow: http://localhost:5001

Production-like Local Deployment¶

# Build production images
make build-all

# Deploy with production settings
make run-ghcr

# Check status
make status

Production Deployment¶

Environment Setup¶

1. Server Preparation¶

# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Add user to docker group
sudo usermod -aG docker $USER

2. Application Deployment¶

# Clone repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo

# Configure environment
cp env.example .env
# Edit .env with production values

# Deploy
make run-services

# Verify deployment
make health-check

Production Configuration¶

Environment Variables¶

# Production settings
PRODUCTION_MODE=true
DEBUG=false
LOG_LEVEL=INFO

# Security
JWT_SECRET_KEY=your-secure-secret-key
SKIP_AUTH=false

# Database
COLLECTIONDB_HOST=postgres
COLLECTIONDB_NAME=rag_modulo_prod
COLLECTIONDB_USER=rag_user
COLLECTIONDB_PASS=secure-password

# AI Services
WATSONX_APIKEY=your-watsonx-api-key
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_INSTANCE_ID=your-instance-id

Security Hardening¶

# Use production images
make build-all

# Enable security features
export SECURITY_SCAN=true
export VULNERABILITY_CHECK=true

# Run security checks
make security-check

SSL/TLS Configuration¶

Using Let's Encrypt¶

# Install Certbot
sudo apt install certbot

# Generate certificates
sudo certbot certonly --standalone -d your-domain.com

# Configure nginx with SSL
# (See nginx configuration examples)

Using Reverse Proxy¶

# docker-compose.prod.yml
services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/ssl
    depends_on:
      - backend
      - frontend

Cloud Deployment¶

IBM Cloud Code Engine¶

IBM Cloud Code Engine provides a fully managed, serverless platform for running containerized applications. This is the recommended deployment method for production workloads.

Complete Application Deployment¶

The RAG Modulo application consists of multiple components deployed to IBM Cloud Code Engine:

Application Components: - Backend: FastAPI Python application (1 CPU, 4GB RAM) - Frontend: React application (0.5 CPU, 1GB RAM)

Infrastructure Components: - PostgreSQL: Database for application data (0.5 CPU, 2GB RAM) - MinIO: S3-compatible object storage (0.25 CPU, 1GB RAM) - etcd: Key-value store for Milvus (0.25 CPU, 1GB RAM) - Milvus: Vector database for embeddings (0.5 CPU, 2GB RAM)

Quick Start¶

Prerequisites: Set up IBM Cloud account and configure GitHub secrets
Deploy Complete App: Use GitHub Actions workflow for full-stack deployment
Monitor: Built-in health checks and monitoring for all components

# Deploy complete application via GitHub Actions
# 1. Go to Actions tab in GitHub repository
# 2. Select "Deploy Complete RAG Modulo Application"
# 3. Choose environment (staging/production)
# 4. Click "Run workflow"

Key Features¶

Full-Stack Deployment: Backend + Frontend + Infrastructure in single workflow
Complete Infrastructure: PostgreSQL, MinIO, etcd, and Milvus deployed automatically
Security-First: Automated vulnerability scanning with Trivy for all images
Auto-Scaling: Backend scales 1-5 instances, Frontend scales 1-3 instances
Health Monitoring: Built-in health checks and smoke tests for all components
Cost Optimization: Pay only for what you use
Zero Downtime: Rolling updates with zero downtime
Daily Builds: Automated daily builds with optional deployment

Resources¶

Application Components: - Backend: 1 CPU, 4GB RAM, scales 1-5 instances, port 8000 - Frontend: 0.5 CPU, 1GB RAM, scales 1-3 instances, port 3000

Infrastructure Components: - PostgreSQL: 0.5 CPU, 2GB RAM, scales 1 instance, port 5432 - MinIO: 0.25 CPU, 1GB RAM, scales 1 instance, port 9000 - etcd: 0.25 CPU, 1GB RAM, scales 1 instance, port 2379 - Milvus: 0.5 CPU, 2GB RAM, scales 1 instance, port 19530

For detailed instructions, see IBM Cloud Code Engine Deployment Guide.

AWS Deployment¶

Using ECS¶

# Build and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-account.dkr.ecr.us-east-1.amazonaws.com

# Tag and push images
docker tag rag-modulo/backend:latest your-account.dkr.ecr.us-east-1.amazonaws.com/rag-modulo-backend:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/rag-modulo-backend:latest

# Deploy to ECS
aws ecs create-service --cluster rag-modulo-cluster --service-name rag-modulo-service --task-definition rag-modulo-task

Using EKS¶

# Create EKS cluster
eksctl create cluster --name rag-modulo-cluster --region us-east-1

# Deploy with kubectl
kubectl apply -f k8s/

Google Cloud Deployment¶

Using GKE¶

# Create GKE cluster
gcloud container clusters create rag-modulo-cluster --zone us-central1-a

# Deploy application
kubectl apply -f k8s/

Azure Deployment¶

Using AKS¶

# Create AKS cluster
az aks create --resource-group rag-modulo-rg --name rag-modulo-cluster --node-count 3

# Deploy application
kubectl apply -f k8s/

Configuration¶

Application Configuration¶

Backend Configuration¶

# backend/core/config.py
class Settings(BaseSettings):
    # Production settings
    production_mode: bool = Field(default=False, env="PRODUCTION_MODE")
    debug: bool = Field(default=False, env="DEBUG")
    log_level: str = Field(default="INFO", env="LOG_LEVEL")

    # Security
    jwt_secret_key: str = Field(env="JWT_SECRET_KEY")
    skip_auth: bool = Field(default=False, env="SKIP_AUTH")

    # Database
    collectiondb_host: str = Field(default="postgres", env="COLLECTIONDB_HOST")
    collectiondb_name: str = Field(default="rag_modulo", env="COLLECTIONDB_NAME")

    # AI Services
    watsonx_apikey: str = Field(env="WATSONX_APIKEY")
    watsonx_url: str = Field(env="WATSONX_URL")

Frontend Configuration¶

// webui/src/config.js
const config = {
  apiUrl: process.env.REACT_APP_API_URL || 'http://localhost:8000',
  environment: process.env.NODE_ENV || 'development',
  features: {
    analytics: process.env.REACT_APP_ANALYTICS_ENABLED === 'true',
    debug: process.env.REACT_APP_DEBUG === 'true'
  }
};

Database Configuration¶

PostgreSQL Setup¶

-- Create production database
CREATE DATABASE rag_modulo_prod;
CREATE USER rag_user WITH PASSWORD 'secure-password';
GRANT ALL PRIVILEGES ON DATABASE rag_modulo_prod TO rag_user;

Milvus Configuration¶

# milvus-config.yaml
etcd:
  endpoints:
    - milvus-etcd:2379
  rootPath: by-dev
  metaPath: meta

common:
  security:
    authorizationEnabled: false

Monitoring Configuration¶

Prometheus Setup¶

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'rag-modulo-backend'
    static_configs:
      - targets: ['backend:8000']
    metrics_path: '/metrics'

Grafana Dashboard¶

{
  "dashboard": {
    "title": "RAG Modulo Monitoring",
    "panels": [
      {
        "title": "API Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}

Monitoring¶

Health Checks¶

# Application health
curl http://localhost:8000/health

# Database health
curl http://localhost:8000/health/database

# Vector database health
curl http://localhost:8000/health/vector-db

# AI service health
curl http://localhost:8000/health/ai-services

Logging¶

Log Configuration¶

# backend/core/logging.py
import logging
import sys

def setup_logging(level: str = "INFO"):
    logging.basicConfig(
        level=getattr(logging, level.upper()),
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.StreamHandler(sys.stdout),
            logging.FileHandler('/app/logs/rag_modulo.log')
        ]
    )

Log Monitoring¶

# View application logs
docker logs rag_modulo-backend-1

# Follow logs in real-time
docker logs -f rag_modulo-backend-1

# View all service logs
make logs

Metrics¶

Application Metrics¶

Response Time: API endpoint performance
Throughput: Requests per second
Error Rate: Failed request percentage
Resource Usage: CPU, memory, disk usage

Business Metrics¶

Document Processing: Documents processed per hour
Search Performance: Search query response time
User Activity: Active users, session duration
AI Service Usage: WatsonX API calls, costs

Alerting¶

Alert Rules¶

# alerts.yml
groups:
  - name: rag-modulo
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"

Troubleshooting¶

Common Deployment Issues¶

Container Won't Start¶

# Check container logs
docker logs rag_modulo-backend-1

# Check container status
docker ps -a

# Restart container
docker restart rag_modulo-backend-1

Database Connection Issues¶

# Test database connectivity
docker exec rag_modulo-backend-1 python -c "
from core.config import Settings
from sqlalchemy import create_engine
settings = Settings()
engine = create_engine(settings.database_url)
print('Database connection successful')
"

# Check database logs
docker logs rag_modulo-postgres-1

AI Service Issues¶

# Test WatsonX connectivity
curl -H "Authorization: Bearer $WATSONX_APIKEY" \
     -H "Content-Type: application/json" \
     "$WATSONX_URL/v1/embeddings" \
     -d '{"input": "test", "model": "sentence-transformers/all-MiniLM-L6-v2"}'

# Check AI service logs
docker logs rag_modulo-backend-1 | grep -i watson

Performance Issues¶

# Check resource usage
docker stats

# Profile application
make dev-profile

# Check slow queries
docker exec rag_modulo-postgres-1 psql -U rag_user -d rag_modulo -c "
SELECT query, mean_time, calls
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
"

Recovery Procedures¶

Database Recovery¶

# Backup database
docker exec rag_modulo-postgres-1 pg_dump -U rag_user rag_modulo > backup.sql

# Restore database
docker exec -i rag_modulo-postgres-1 psql -U rag_user rag_modulo < backup.sql

Application Recovery¶

# Rollback to previous version
git checkout previous-stable-tag
make build-all
make run-services

# Restart all services
make restart-app

Deployment Guide¶

Table of Contents¶

Deployment Overview¶

Prerequisites¶

System Requirements¶

Software Requirements¶

External Services¶

Local Deployment¶

Quick Start¶

Development Deployment¶

Production-like Local Deployment¶

Production Deployment¶

Environment Setup¶

1. Server Preparation¶

2. Application Deployment¶

Production Configuration¶

Environment Variables¶

Security Hardening¶

SSL/TLS Configuration¶

Using Let's Encrypt¶

Using Reverse Proxy¶

Cloud Deployment¶

IBM Cloud Code Engine¶

Complete Application Deployment¶

Quick Start¶

Key Features¶

Resources¶

AWS Deployment¶

Using ECS¶

Using EKS¶

Google Cloud Deployment¶

Using GKE¶

Azure Deployment¶

Using AKS¶

Configuration¶

Application Configuration¶

Backend Configuration¶

Frontend Configuration¶

Database Configuration¶

PostgreSQL Setup¶

Milvus Configuration¶

Monitoring Configuration¶

Prometheus Setup¶

Grafana Dashboard¶

Monitoring¶

Health Checks¶

Logging¶

Log Configuration¶

Log Monitoring¶

Metrics¶

Application Metrics¶

Business Metrics¶

Alerting¶

Alert Rules¶

Troubleshooting¶

Common Deployment Issues¶

Container Won't Start¶

Database Connection Issues¶

AI Service Issues¶

Performance Issues¶

Recovery Procedures¶

Database Recovery¶

Application Recovery¶

Support¶

Getting Help¶

Emergency Contacts¶

Next Steps¶