Skip to content

Configuration Guide

This guide covers all configuration options for RAG Modulo, including environment variables, application settings, and service configurations.

Configuration Overview

RAG Modulo uses a hierarchical configuration system:

  1. Environment Variables: Primary configuration method
  2. Configuration Files: Application-specific settings
  3. Docker Compose: Service orchestration
  4. Makefile: Development workflow settings

Environment Variables

Core Application Settings

# Application Mode
PRODUCTION_MODE=false          # Enable production mode
DEBUG=false                    # Enable debug logging
LOG_LEVEL=INFO                # Logging level (DEBUG, INFO, WARNING, ERROR)
TESTING=false                 # Enable testing mode
DEVELOPMENT_MODE=false        # Enable development features

# Authentication Bypass (Development/Testing Only)
# See: docs/features/authentication-bypass.md for detailed documentation
SKIP_AUTH=false               # Set to true to bypass IBM OIDC authentication
                              # When true: Backend provides mock user + bypass token
                              # When false: Full IBM OIDC authentication required
                              # SECURITY: Never set to true in production!
                              # Application will refuse to start if SKIP_AUTH=true and ENVIRONMENT=production

Security Configuration

# JWT Configuration
JWT_SECRET_KEY=your-secret-key-256-bits    # JWT signing secret
JWT_ALGORITHM=HS256                       # JWT algorithm
JWT_EXPIRATION_HOURS=24                   # Token expiration time

# CORS Configuration
CORS_ORIGINS=http://localhost:3000,https://yourdomain.com

# Security Features
SECURITY_SCAN=true             # Enable security scanning
VULNERABILITY_CHECK=true       # Enable vulnerability checks

Database Configuration

# PostgreSQL Settings
COLLECTIONDB_HOST=postgres     # Database host
COLLECTIONDB_PORT=5432         # Database port
COLLECTIONDB_NAME=rag_modulo   # Database name
COLLECTIONDB_USER=rag_user     # Database user
COLLECTIONDB_PASS=rag_password # Database password
COLLECTIONDB_SSL_MODE=disable  # SSL mode (disable, require, prefer)

# Connection Pooling
DB_POOL_SIZE=10               # Connection pool size
DB_MAX_OVERFLOW=20            # Maximum overflow connections
DB_POOL_TIMEOUT=30            # Connection timeout
DB_POOL_RECYCLE=3600          # Connection recycle time

Vector Database Configuration

# Milvus Settings
MILVUS_HOST=milvus-standalone  # Milvus host
MILVUS_PORT=19530             # Milvus port
MILVUS_USER=                  # Milvus username (if auth enabled)
MILVUS_PASSWORD=              # Milvus password (if auth enabled)
MILVUS_DB_NAME=default        # Milvus database name
MILVUS_COLLECTION_PREFIX=collection_  # Collection name prefix

AI Services Configuration

# IBM WatsonX Settings
WATSONX_INSTANCE_ID=your-instance-id     # WatsonX instance ID
WATSONX_APIKEY=your-api-key              # WatsonX API key
WATSONX_URL=https://us-south.ml.cloud.ibm.com  # WatsonX URL

# Embedding Configuration
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2  # Embedding model
EMBEDDING_DIM=384                        # Embedding dimensions
EMBEDDING_FIELD=embedding                # Embedding field name
EMBEDDING_BATCH_SIZE=32                  # Batch size for embeddings

Document Processing & Chunking Configuration

# IBM Docling Document Processing
ENABLE_DOCLING=true                      # Enable IBM Docling for advanced document processing
DOCLING_FALLBACK_ENABLED=true            # Enable fallback to traditional processing if Docling fails

# HybridChunker Configuration
USE_DOCLING_CHUNKER=true                 # Use Docling's HybridChunker for token-aware chunking
CHUNKING_TOKENIZER_MODEL=ibm-granite/granite-embedding-english-r2  # Tokenizer model for token counting

# Chunking Strategy (used when USE_DOCLING_CHUNKER=false)
CHUNKING_STRATEGY=fixed                  # Chunking strategy (fixed, semantic, hierarchical)
MIN_CHUNK_SIZE=100                       # Minimum chunk size in tokens
MAX_CHUNK_SIZE=400                       # Maximum chunk size in tokens
CHUNK_OVERLAP=10                         # Overlap between chunks

HybridChunker Details

When USE_DOCLING_CHUNKER=true:

  • Token-Aware Chunking: Uses HuggingFace tokenizers to count actual tokens, ensuring chunks stay within embedding model limits
  • Tokenizer Model: Should match your embedding model family for accurate token counts:
  • IBM Slate/Granite embeddings โ†’ ibm-granite/granite-embedding-english-r2
  • Sentence Transformers โ†’ sentence-transformers/all-MiniLM-L6-v2
  • Max Tokens: Defaults to 400 tokens (78% of IBM Slate's 512 limit) with safety margin for metadata
  • Semantic Merging: Automatically merges semantically similar chunks when merge_peers=True

Benefits:

  • โœ… Prevents "token count exceeds maximum" errors
  • โœ… Accurate token counting (no tokenizer mismatch)
  • โœ… Better chunk quality with semantic boundaries
  • โœ… Optimal for IBM Slate/Granite embeddings

When to Use:

  • โœ… Using IBM Slate/Granite embeddings (recommended)
  • โœ… Processing long documents (PDFs, reports)
  • โœ… Need precise token control for embedding models

When to Disable:

  • Traditional fixed-size chunking preferred
  • Custom chunking strategy needed
  • Docling not installed

Object Storage Configuration

# MinIO Settings
MINIO_ENDPOINT=minio:9000     # MinIO endpoint
MINIO_ACCESS_KEY=minioadmin   # MinIO access key
MINIO_SECRET_KEY=minioadmin   # MinIO secret key
MINIO_BUCKET_NAME=rag-modulo  # Default bucket name
MINIO_SECURE=false            # Use HTTPS

MLflow Configuration

# MLflow Settings
MLFLOW_TRACKING_URI=http://mlflow-server:5000  # MLflow tracking URI
MLFLOW_TRACKING_USERNAME=mlflow                # MLflow username
MLFLOW_TRACKING_PASSWORD=mlflow123             # MLflow password
MLFLOW_EXPERIMENT_NAME=rag-modulo              # Default experiment name

OIDC Configuration

# IBM OIDC Settings
OIDC_DISCOVERY_ENDPOINT=https://your-oidc-provider/.well-known/openid_configuration
OIDC_AUTH_URL=https://your-oidc-provider/auth
OIDC_TOKEN_URL=https://your-oidc-provider/token
OIDC_CLIENT_ID=your-client-id
OIDC_CLIENT_SECRET=your-client-secret
FRONTEND_URL=http://localhost:3000

Configuration Files

Backend Configuration

# backend/core/config.py
from pydantic import BaseSettings, Field
from typing import Optional

class Settings(BaseSettings):
    """Application settings with environment variable support."""

    # Application
    production_mode: bool = Field(default=False, env="PRODUCTION_MODE")
    debug: bool = Field(default=False, env="DEBUG")
    log_level: str = Field(default="INFO", env="LOG_LEVEL")
    testing: bool = Field(default=False, env="TESTING")
    skip_auth: bool = Field(default=False, env="SKIP_AUTH")
    development_mode: bool = Field(default=False, env="DEVELOPMENT_MODE")

    # Security
    jwt_secret_key: str = Field(env="JWT_SECRET_KEY")
    jwt_algorithm: str = Field(default="HS256", env="JWT_ALGORITHM")
    jwt_expiration_hours: int = Field(default=24, env="JWT_EXPIRATION_HOURS")

    # Database
    collectiondb_host: str = Field(default="postgres", env="COLLECTIONDB_HOST")
    collectiondb_port: int = Field(default=5432, env="COLLECTIONDB_PORT")
    collectiondb_name: str = Field(default="rag_modulo", env="COLLECTIONDB_NAME")
    collectiondb_user: str = Field(default="rag_user", env="COLLECTIONDB_USER")
    collectiondb_pass: str = Field(env="COLLECTIONDB_PASS")

    # AI Services
    watsonx_instance_id: str = Field(env="WATSONX_INSTANCE_ID")
    watsonx_apikey: str = Field(env="WATSONX_APIKEY")
    watsonx_url: str = Field(env="WATSONX_URL")

    class Config:
        env_file = ".env"
        case_sensitive = False

Frontend Configuration

// webui/src/config.js
const config = {
  // API Configuration
  apiUrl: process.env.REACT_APP_API_URL || 'http://localhost:8000',

  // Environment
  environment: process.env.NODE_ENV || 'development',

  // Features
  features: {
    analytics: process.env.REACT_APP_ANALYTICS_ENABLED === 'true',
    debug: process.env.REACT_APP_DEBUG === 'true',
    hotReload: process.env.NODE_ENV === 'development'
  },

  // Authentication
  auth: {
    provider: process.env.REACT_APP_AUTH_PROVIDER || 'jwt',
    tokenKey: 'rag_modulo_token'
  }
};

export default config;

Docker Configuration

Development Docker Compose

# docker-compose.dev.yml
version: '3.8'

services:
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile.backend
    environment:
      - DEVELOPMENT_MODE=true
      - DEBUG=true
      - LOG_LEVEL=DEBUG
      - TESTING=true
      - SKIP_AUTH=true
    env_file:
      - .env.dev
    volumes:
      - ./backend:/app:ro
      - ./logs:/app/logs
    ports:
      - "8000:8000"

  frontend:
    build:
      context: ./webui
      dockerfile: Dockerfile.frontend
    environment:
      - REACT_APP_API_URL=http://localhost:8000
      - REACT_APP_DEBUG=true
    ports:
      - "3000:8080"

Production Docker Compose

# docker-compose.prod.yml
version: '3.8'

services:
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile.backend
    environment:
      - PRODUCTION_MODE=true
      - DEBUG=false
      - LOG_LEVEL=INFO
      - SECURITY_SCAN=true
    env_file:
      - .env.prod
    volumes:
      - backend_data:/mnt/data
      - ./logs:/app/logs
    restart: unless-stopped

  frontend:
    build:
      context: ./webui
      dockerfile: Dockerfile.frontend
    environment:
      - REACT_APP_API_URL=https://api.yourdomain.com
      - REACT_APP_DEBUG=false
    restart: unless-stopped

CLI Configuration

CLI Settings

# backend/rag_solution/cli/config.py
from pydantic import BaseModel, Field
from typing import Optional, Dict, Any

class RAGConfig(BaseModel):
    """CLI configuration model."""

    # API Configuration
    api_url: str = Field(default="http://localhost:8000", env="RAG_API_URL")
    timeout: int = Field(default=30, env="RAG_TIMEOUT")

    # Authentication
    token: Optional[str] = Field(default=None, env="RAG_TOKEN")
    username: Optional[str] = Field(default=None, env="RAG_USERNAME")
    password: Optional[str] = Field(default=None, env="RAG_PASSWORD")

    # Output Configuration
    output_format: str = Field(default="table", env="RAG_OUTPUT_FORMAT")
    verbose: bool = Field(default=False, env="RAG_VERBOSE")
    dry_run: bool = Field(default=False, env="RAG_DRY_RUN")

    # Profile Management
    profile: str = Field(default="default", env="RAG_PROFILE")
    profiles: Dict[str, Dict[str, Any]] = Field(default_factory=dict)

CLI Profile Configuration

# ~/.rag_modulo/profiles.yaml
profiles:
  default:
    api_url: "http://localhost:8000"
    output_format: "table"
    verbose: false

  production:
    api_url: "https://api.yourdomain.com"
    output_format: "json"
    verbose: true

  development:
    api_url: "http://localhost:8000"
    output_format: "table"
    verbose: true
    dry_run: true

Environment-Specific Configuration

Development Environment

# .env.dev
TESTING=true
SKIP_AUTH=true
DEVELOPMENT_MODE=true
DEBUG=true
LOG_LEVEL=DEBUG
JWT_SECRET_KEY=dev-jwt-secret-key-for-local-development-only

Production Environment

# .env.prod
PRODUCTION_MODE=true
DEBUG=false
LOG_LEVEL=INFO
SECURITY_SCAN=true
VULNERABILITY_CHECK=true
JWT_SECRET_KEY=your-secure-production-secret-key-256-bits
CORS_ORIGINS=https://yourdomain.com,https://www.yourdomain.com

Testing Environment

# .env.test
TESTING=true
SKIP_AUTH=true
LOG_LEVEL=DEBUG
DB_NAME=rag_modulo_test
JWT_SECRET_KEY=test-jwt-secret-key

Configuration Validation

Environment Validation

# Validate environment configuration
make validate-env

# Check specific configuration
docker compose -f docker-compose.dev.yml config

Application Validation

# backend/core/validation.py
from pydantic import BaseModel, validator
import os

class ConfigValidator(BaseModel):
    """Configuration validation."""

    @validator('jwt_secret_key')
    def validate_jwt_secret(cls, v):
        """Validate JWT secret key strength."""
        if len(v) < 32:
            raise ValueError('JWT secret key must be at least 32 characters')
        return v

    @validator('watsonx_apikey')
    def validate_watsonx_key(cls, v):
        """Validate WatsonX API key format."""
        if not v or len(v) < 20:
            raise ValueError('WatsonX API key is required and must be valid')
        return v

Configuration Management

Environment Switching

# Switch to development
make dev-up

# Switch to production
make run-services

# Switch to testing
make test-env

Configuration Backup

# Backup configuration
cp .env .env.backup
cp .env.dev .env.dev.backup
cp .env.prod .env.prod.backup

# Restore configuration
cp .env.backup .env

Troubleshooting Configuration

Common Issues

Environment Variables Not Loading

# Check environment file
cat .env.dev

# Verify Docker environment
docker compose -f docker-compose.dev.yml config

# Check application logs
make dev-logs

Configuration Validation Errors

# Validate configuration
make validate-env

# Check specific settings
python -c "from backend.core.config import Settings; print(Settings())"

Service Configuration Issues

# Check service status
make dev-status

# Restart with new configuration
make dev-restart

# Validate all services
make dev-validate

HybridChunker and Tokenizer Issues

Problem: "Failed to load tokenizer" error during startup

Cause: Network connectivity issues, invalid tokenizer model name, or HuggingFace access problems

Solution:

# 1. Verify tokenizer model exists on HuggingFace
# Visit: https://huggingface.co/ibm-granite/granite-embedding-english-r2

# 2. Check network connectivity
curl -I https://huggingface.co

# 3. Test tokenizer download manually
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('ibm-granite/granite-embedding-english-r2')"

# 4. If behind a proxy, set environment variables
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080

# 5. Check CHUNKING_TOKENIZER_MODEL setting in .env
grep CHUNKING_TOKENIZER_MODEL .env

Problem: "Token indices sequence length is longer than maximum" errors persist

Cause: Chunks exceed embedding model's token limit despite HybridChunker configuration

Solution:

# 1. Verify CHUNKING_MAX_TOKENS is set correctly (default: 400 for IBM Slate)
grep CHUNKING_MAX_TOKENS .env

# 2. Reduce max_tokens if needed (must be < 512 for IBM Slate)
# Edit .env:
CHUNKING_MAX_TOKENS=350  # More conservative limit

# 3. Ensure USE_DOCLING_CHUNKER=true
grep USE_DOCLING_CHUNKER .env

# 4. Check logs for token count statistics
grep "Chunking complete" logs/rag_modulo.log

# 5. Verify tokenizer matches embedding model family
# IBM Slate embeddings โ†’ ibm-granite tokenizer
# Sentence Transformers โ†’ sentence-transformers tokenizer

Problem: "HybridChunker not initialized" warning in logs

Cause: USE_DOCLING_CHUNKER=false or Docling not installed

Solution:

# 1. Enable HybridChunker in .env
USE_DOCLING_CHUNKER=true

# 2. Verify Docling is installed
poetry show | grep docling

# 3. If not installed, add Docling
poetry add docling

# 4. Restart application
make local-dev-restart

Best Practices

Security

  • Use strong, unique secrets for each environment
  • Never commit secrets to version control
  • Use environment-specific configuration files
  • Enable security scanning in production

Performance

  • Optimize database connection pooling
  • Configure appropriate log levels
  • Use production-optimized settings
  • Monitor resource usage

Maintainability

  • Use descriptive environment variable names
  • Document all configuration options
  • Validate configuration on startup
  • Use configuration templates

Configuration complete! Check out the Development Guide to start developing with your configured environment.