Skip to content

Architecture Overview

RAG Modulo is a production-ready, modular Retrieval-Augmented Generation platform built with clean architecture principles.

System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Frontend (React)                         โ”‚
โ”‚                    Carbon Design System                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚ REST API
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Backend (FastAPI)                           โ”‚
โ”‚                                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚  Router    โ”‚  โ”‚  Service   โ”‚  โ”‚ Repository โ”‚                โ”‚
โ”‚  โ”‚  Layer     โ”‚โ”€โ”€โ”‚   Layer    โ”‚โ”€โ”€โ”‚   Layer    โ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚                                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚ Provider   โ”‚  โ”‚ Pipeline   โ”‚  โ”‚  Models    โ”‚                โ”‚
โ”‚  โ”‚  System    โ”‚  โ”‚  Engine    โ”‚  โ”‚  (SQLAlch) โ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Infrastructure                              โ”‚
โ”‚                                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚ PostgreSQL โ”‚  โ”‚  Milvus    โ”‚  โ”‚   MinIO    โ”‚                โ”‚
โ”‚  โ”‚ (Metadata) โ”‚  โ”‚  (Vectors) โ”‚  โ”‚  (Storage) โ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚                                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚  WatsonX   โ”‚  โ”‚  OpenAI    โ”‚  โ”‚ Anthropic  โ”‚                โ”‚
โ”‚  โ”‚    LLM     โ”‚  โ”‚    LLM     โ”‚  โ”‚    LLM     โ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Layers

Router Layer

  • Location: backend/rag_solution/router/
  • Purpose: API endpoints and HTTP handling
  • Key Files:
  • search_router.py - Search endpoints
  • conversation_router.py - Conversation management
  • collection_router.py - Collection operations
  • document_router.py - Document upload/management

Service Layer

  • Location: backend/rag_solution/services/
  • Purpose: Business logic and orchestration
  • Key Services:
  • SearchService - RAG search orchestration
  • ChainOfThoughtService - CoT reasoning
  • ConversationService - Conversation management
  • DocumentService - Document processing
  • PipelineService - Pipeline resolution

Repository Layer

  • Location: backend/rag_solution/repository/
  • Purpose: Data access abstraction
  • Key Repositories:
  • ConversationRepository - Unified conversation data access
  • CollectionRepository - Collection operations
  • DocumentRepository - Document persistence

Provider System

  • Location: backend/rag_solution/generation/providers/
  • Purpose: LLM provider abstraction
  • Providers:
  • WatsonXProvider - IBM WatsonX
  • OpenAIProvider - OpenAI GPT models
  • AnthropicProvider - Anthropic Claude

Data Flow

Document Ingestion

User uploads document
    โ†“
Router validates file
    โ†“
DocumentService processes
    โ†“
Docling extracts content
    โ†“
Text chunking
    โ†“
Embedding generation
    โ†“
Milvus stores vectors
    โ†“
PostgreSQL stores metadata

Search Query

User submits question
    โ†“
Router receives request
    โ†“
SearchService orchestrates
    โ†“
Pipeline resolution
    โ†“
Query rewriting (optional)
    โ†“
Vector similarity search (Milvus)
    โ†“
Reranking (optional)
    โ†“
Context building
    โ†“
LLM generation (provider)
    โ†“
Source attribution
    โ†“
Response to user

Chain of Thought Flow

Complex question detected
    โ†“
QuestionDecomposer breaks into sub-questions
    โ†“
For each sub-question:
  โ”œโ”€ Vector search
  โ”œโ”€ Context retrieval
  โ””โ”€ LLM answer generation
    โ†“
AnswerSynthesizer combines steps
    โ†“
Source attribution across all steps
    โ†“
Quality scoring & validation
    โ†“
Final comprehensive answer

Key Components

Pipeline Engine

Automatic pipeline resolution and execution:

  1. Resolution: Determines user's default pipeline
  2. Creation: Auto-creates pipelines for new users
  3. Execution: Runs pipeline stages sequentially
  4. Fallback: Graceful error handling

Stages: - Query rewriting - Retrieval - Reranking - Generation

Chain of Thought Service

Production-hardened reasoning system:

  • 5-layer parsing for leakage prevention
  • Quality scoring with confidence thresholds
  • Retry logic with exponential backoff
  • Source attribution across reasoning steps

See Chain of Thought for details.

Conversation System

Unified repository pattern for session management:

  • Session lifecycle management
  • Message persistence
  • Conversation history
  • Summarization support

See Conversation System Refactoring for migration details.

Technology Stack

Backend

  • Framework: FastAPI
  • ORM: SQLAlchemy
  • Validation: Pydantic
  • Testing: pytest (947+ tests)
  • Linting: Ruff, MyPy, Pylint

Frontend

  • Framework: React 18
  • UI Library: Carbon Design System
  • State Management: React hooks
  • HTTP Client: axios

Infrastructure

  • Database: PostgreSQL
  • Vector DB: Milvus (configurable)
  • Object Storage: MinIO
  • Model Tracking: MLFlow
  • Containerization: Docker + Docker Compose

LLM Providers

  • Default: IBM WatsonX
  • Alternatives: OpenAI, Anthropic

Design Patterns

Repository Pattern

Abstracts data access with clean interfaces:

class ConversationRepository:
    def create_session(self, input: ConversationSessionInput) -> ConversationSession
    def get_message_by_id(self, message_id: UUID4) -> ConversationMessage
    def create_message(self, input: ConversationMessageInput) -> ConversationMessage

Factory Pattern

Provider instantiation via factory:

provider_factory = ProviderFactory(settings)
provider = provider_factory.get_provider(provider_type)

Service Layer Pattern

Business logic separated from HTTP handling:

@router.post("/search")
async def search(input: SearchInput, service: SearchService = Depends()):
    return await service.search(input)

Dependency Injection

FastAPI dependency injection for testability:

def get_db() -> Generator[Session, None, None]:
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

Configuration

Environment-based configuration via .env:

  • Database: COLLECTIONDB_* variables
  • Vector DB: VECTOR_DB, MILVUS_* variables
  • LLM: WATSONX_*, OPENAI_API_KEY, ANTHROPIC_API_KEY
  • Security: JWT_SECRET_KEY

See Configuration Guide for details.

Scalability Considerations

Horizontal Scaling

  • Stateless API services
  • Database connection pooling
  • Distributed vector database (Milvus)

Performance Optimization

  • Async I/O with FastAPI
  • Database query optimization
  • Vector search indexing (HNSW)
  • Response caching

See Performance and Scalability for details.

Security Architecture

Authentication & Authorization

  • JWT-based authentication
  • Role-based access control (future)
  • API key management

Secret Management

  • Environment variables for secrets
  • Multi-layer secret scanning (Gitleaks, TruffleHog)
  • Pre-commit hooks for prevention

Data Protection

  • HTTPS in production
  • Database encryption at rest
  • Secure file uploads

See Security for comprehensive security documentation.

Testing Strategy

947+ automated tests across categories:

  • Atomic: Schema validation (~5s)
  • Unit: Component isolation (~30s)
  • Integration: Service interaction (~2 min)
  • E2E: Full workflows (~5 min)

See Testing Guide for details.

Deployment

Local Development

make local-dev-setup        # One-time setup
make local-dev-infra        # Start infrastructure
make local-dev-backend      # Start backend
make local-dev-frontend     # Start frontend

Production

make build-all              # Build Docker images
make prod-start             # Start production stack

Images published to GitHub Container Registry (GHCR).

See Deployment Guide for details.

See Also