Architecture Overview¶
RAG Modulo is a production-ready, modular Retrieval-Augmented Generation platform built with clean architecture principles.
System Architecture¶
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Frontend (React) โ
โ Carbon Design System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REST API
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Backend (FastAPI) โ
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ Router โ โ Service โ โ Repository โ โ
โ โ Layer โโโโ Layer โโโโ Layer โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ Provider โ โ Pipeline โ โ Models โ โ
โ โ System โ โ Engine โ โ (SQLAlch) โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Infrastructure โ
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ PostgreSQL โ โ Milvus โ โ MinIO โ โ
โ โ (Metadata) โ โ (Vectors) โ โ (Storage) โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ WatsonX โ โ OpenAI โ โ Anthropic โ โ
โ โ LLM โ โ LLM โ โ LLM โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Core Layers¶
Router Layer¶
- Location:
backend/rag_solution/router/ - Purpose: API endpoints and HTTP handling
- Key Files:
search_router.py- Search endpointsconversation_router.py- Conversation managementcollection_router.py- Collection operationsdocument_router.py- Document upload/management
Service Layer¶
- Location:
backend/rag_solution/services/ - Purpose: Business logic and orchestration
- Key Services:
SearchService- RAG search orchestrationChainOfThoughtService- CoT reasoningConversationService- Conversation managementDocumentService- Document processingPipelineService- Pipeline resolution
Repository Layer¶
- Location:
backend/rag_solution/repository/ - Purpose: Data access abstraction
- Key Repositories:
ConversationRepository- Unified conversation data accessCollectionRepository- Collection operationsDocumentRepository- Document persistence
Provider System¶
- Location:
backend/rag_solution/generation/providers/ - Purpose: LLM provider abstraction
- Providers:
WatsonXProvider- IBM WatsonXOpenAIProvider- OpenAI GPT modelsAnthropicProvider- Anthropic Claude
Data Flow¶
Document Ingestion¶
User uploads document
โ
Router validates file
โ
DocumentService processes
โ
Docling extracts content
โ
Text chunking
โ
Embedding generation
โ
Milvus stores vectors
โ
PostgreSQL stores metadata
Search Query¶
User submits question
โ
Router receives request
โ
SearchService orchestrates
โ
Pipeline resolution
โ
Query rewriting (optional)
โ
Vector similarity search (Milvus)
โ
Reranking (optional)
โ
Context building
โ
LLM generation (provider)
โ
Source attribution
โ
Response to user
Chain of Thought Flow¶
Complex question detected
โ
QuestionDecomposer breaks into sub-questions
โ
For each sub-question:
โโ Vector search
โโ Context retrieval
โโ LLM answer generation
โ
AnswerSynthesizer combines steps
โ
Source attribution across all steps
โ
Quality scoring & validation
โ
Final comprehensive answer
Key Components¶
Pipeline Engine¶
Automatic pipeline resolution and execution:
- Resolution: Determines user's default pipeline
- Creation: Auto-creates pipelines for new users
- Execution: Runs pipeline stages sequentially
- Fallback: Graceful error handling
Stages: - Query rewriting - Retrieval - Reranking - Generation
Chain of Thought Service¶
Production-hardened reasoning system:
- 5-layer parsing for leakage prevention
- Quality scoring with confidence thresholds
- Retry logic with exponential backoff
- Source attribution across reasoning steps
See Chain of Thought for details.
Conversation System¶
Unified repository pattern for session management:
- Session lifecycle management
- Message persistence
- Conversation history
- Summarization support
See Conversation System Refactoring for migration details.
Technology Stack¶
Backend¶
- Framework: FastAPI
- ORM: SQLAlchemy
- Validation: Pydantic
- Testing: pytest (947+ tests)
- Linting: Ruff, MyPy, Pylint
Frontend¶
- Framework: React 18
- UI Library: Carbon Design System
- State Management: React hooks
- HTTP Client: axios
Infrastructure¶
- Database: PostgreSQL
- Vector DB: Milvus (configurable)
- Object Storage: MinIO
- Model Tracking: MLFlow
- Containerization: Docker + Docker Compose
LLM Providers¶
- Default: IBM WatsonX
- Alternatives: OpenAI, Anthropic
Design Patterns¶
Repository Pattern¶
Abstracts data access with clean interfaces:
class ConversationRepository:
def create_session(self, input: ConversationSessionInput) -> ConversationSession
def get_message_by_id(self, message_id: UUID4) -> ConversationMessage
def create_message(self, input: ConversationMessageInput) -> ConversationMessage
Factory Pattern¶
Provider instantiation via factory:
provider_factory = ProviderFactory(settings)
provider = provider_factory.get_provider(provider_type)
Service Layer Pattern¶
Business logic separated from HTTP handling:
@router.post("/search")
async def search(input: SearchInput, service: SearchService = Depends()):
return await service.search(input)
Dependency Injection¶
FastAPI dependency injection for testability:
def get_db() -> Generator[Session, None, None]:
db = SessionLocal()
try:
yield db
finally:
db.close()
Configuration¶
Environment-based configuration via .env:
- Database:
COLLECTIONDB_*variables - Vector DB:
VECTOR_DB,MILVUS_*variables - LLM:
WATSONX_*,OPENAI_API_KEY,ANTHROPIC_API_KEY - Security:
JWT_SECRET_KEY
See Configuration Guide for details.
Scalability Considerations¶
Horizontal Scaling¶
- Stateless API services
- Database connection pooling
- Distributed vector database (Milvus)
Performance Optimization¶
- Async I/O with FastAPI
- Database query optimization
- Vector search indexing (HNSW)
- Response caching
See Performance and Scalability for details.
Security Architecture¶
Authentication & Authorization¶
- JWT-based authentication
- Role-based access control (future)
- API key management
Secret Management¶
- Environment variables for secrets
- Multi-layer secret scanning (Gitleaks, TruffleHog)
- Pre-commit hooks for prevention
Data Protection¶
- HTTPS in production
- Database encryption at rest
- Secure file uploads
See Security for comprehensive security documentation.
Testing Strategy¶
947+ automated tests across categories:
- Atomic: Schema validation (~5s)
- Unit: Component isolation (~30s)
- Integration: Service interaction (~2 min)
- E2E: Full workflows (~5 min)
See Testing Guide for details.
Deployment¶
Local Development¶
make local-dev-setup # One-time setup
make local-dev-infra # Start infrastructure
make local-dev-backend # Start backend
make local-dev-frontend # Start frontend
Production¶
Images published to GitHub Container Registry (GHCR).
See Deployment Guide for details.
See Also¶
- System Design - Detailed system design
- Components - Component documentation
- Data Flow - Data flow diagrams
- Security - Security architecture
- Performance - Performance optimization