Glossary¶
A¶
- Anthropic
- LLM provider offering Claude models. Supported as an alternative to WatsonX.
C¶
- Chain of Thought (CoT)
- Advanced reasoning technique that breaks complex questions into sub-questions, answers each step, and synthesizes comprehensive answers with source attribution.
- ChromaDB
- Supported vector database option for storing document embeddings.
- Collection
- Logical grouping of documents for organization and querying.
- ConversationRepository
- Unified repository pattern for managing conversation sessions, messages, and summaries.
D¶
- Docling (IBM)
- Advanced document processing library for extracting content from PDFs, images, tables, and complex formats.
E¶
- Elasticsearch
- Supported vector database option with hybrid search capabilities.
- Embedding
- Vector representation of text that captures semantic meaning for similarity search.
G¶
- GHCR (GitHub Container Registry)
- Container registry where RAG Modulo Docker images are published.
L¶
- LLM (Large Language Model)
- AI model used for generating responses based on retrieved context.
M¶
- Material Theme
- Modern documentation theme used for GitHub Pages with dark/light mode support.
- Milvus
- Default vector database for RAG Modulo, optimized for similarity search at scale.
- MkDocs
- Static site generator used for project documentation.
O¶
- OpenAI
- LLM provider offering GPT models. Supported as an alternative to WatsonX.
P¶
- Pinecone
- Cloud-based vector database option for managed vector search.
- Pipeline
- Automated workflow for document processing and query execution.
R¶
- RAG (Retrieval-Augmented Generation)
- AI technique combining document retrieval with LLM generation for accurate, grounded answers.
- Reranking
- Post-processing step that improves search relevance by reordering retrieved documents.
S¶
- Source Attribution
- Feature that tracks and displays which documents/chunks were used to generate answers.
T¶
- Token
- Unit of text processed by LLMs. RAG Modulo tracks token usage for cost monitoring.
- TruffleHog
- Secret scanning tool used in CI/CD to detect exposed API keys.
V¶
- Vector Database
- Specialized database for storing and searching high-dimensional embeddings.
W¶
- WatsonX
- IBM's LLM platform, the default provider for RAG Modulo.
- Weaviate
- Supported vector database with semantic search capabilities.
For more detailed explanations, see the relevant sections in the documentation.