Secret Management Guide¶
Audience: Developers Purpose: How to safely handle secrets, API keys, and credentials in RAG Modulo
Quick Start (5 Minutes)¶
โ Golden Rules¶
- NEVER commit secrets to git
- ALWAYS use environment variables for secrets
- CHECK
.secrets.baselinebefore committing - RUN
make pre-commit-runbefore pushing
๐จ If You See a Secret Detection Error¶
Pre-commit blocked your commit?
# 1. Remove the secret from your code
# 2. Add to .env (never commit .env)
# 3. Reference in env.example with placeholder
# If false positive:
detect-secrets audit .secrets.baseline # Mark as false positive
git add .secrets.baseline
CI/CD failed with secret detection? 1. ROTATE the exposed secret immediately (top priority!) 2. Remove from git history: See Git History Cleanup 3. Fix the code to use environment variables 4. Push the fix
Secret Scanning System Architecture¶
RAG Modulo uses three layers of secret detection for defense-in-depth:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Layer 1: Pre-commit Hooks โ
โ Tool: detect-secrets (with .secrets.baseline) โ
โ Speed: < 1 second โ
โ Scope: Staged files only โ
โ Purpose: Fast local feedback before commit โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Layer 2: Local Testing โ
โ Tool: Gitleaks (via make pre-commit-run) โ
โ Speed: 1-2 seconds โ
โ Scope: Staged files only โ
โ Purpose: Pattern-based detection before push โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Layer 3: CI/CD Pipeline โ
โ Tools: Gitleaks + TruffleHog โ
โ Speed: 30-45 seconds โ
โ Scope: Full git history โ
โ Purpose: Comprehensive scan, BLOCKS merges โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why Three Tools?¶
| Tool | Strength | Detection Method |
|---|---|---|
| detect-secrets | Low false positives, fast | Heuristics + baseline |
| Gitleaks | Custom patterns, configurable | Regex + keywords |
| TruffleHog | High accuracy | Entropy + verification |
Combined: Maximum coverage with minimal false positives
Supported Secret Types¶
RAG Modulo detects 20+ secret types:
Cloud Provider Secrets¶
- AWS: Access keys, secret keys (AKIA*, ASIA*)
- Azure: Storage keys, subscription keys, connection strings
- GCP: Service account keys (JSON), API keys
LLM Provider Keys¶
- OpenAI: API keys (sk-), Project keys (sk-proj-)
- Anthropic: API keys (sk-ant-*)
- WatsonX: API keys, instance IDs
- Google Gemini: API keys (AIza*)
Infrastructure Secrets¶
- PostgreSQL: Database passwords
- MinIO: Root username/password
- MLFlow: Tracking credentials
- JWT: Secret keys
Version Control¶
- GitHub: Personal access tokens, app tokens, fine-grained tokens
- GitLab: Access tokens
Generic Detection¶
- High-entropy strings: Base64-encoded secrets (4.5+ entropy)
- Private keys: SSH, PGP, RSA keys (-----BEGIN PRIVATE KEY-----)
Full configuration: .gitleaks.toml
Local Development Workflow¶
1. Environment Setup¶
.env structure:
# LLM Provider (choose one)
RAG_LLM=watsonx
WATSONX_APIKEY=your_actual_api_key_here
WATSONX_INSTANCE_ID=your_instance_id
# Or use OpenAI
RAG_LLM=openai
OPENAI_API_KEY=sk-your_actual_key_here
# Database
COLLECTIONDB_PASSWORD=strong_password_here
# Security
JWT_SECRET_KEY=generate_random_256_bit_key
Generate secure secrets:
2. Pre-commit Hook Activation¶
# Install pre-commit hooks (one-time setup)
pip install pre-commit
pre-commit install
# Manual run (optional)
pre-commit run --all-files
What runs on commit? - โ detect-secrets (with baseline) - โ Ruff formatting - โ Trailing whitespace check - โ YAML/JSON validation - โ Private key detection
What runs on push? - โ MyPy type checking - โ Pylint code quality - โ Unit tests (fast)
3. Local Testing Before Push¶
# Run all pre-commit checks manually
make pre-commit-run
# Check for secrets specifically (matches CI)
# Gitleaks scans staged files in Step 1/10 of pre-commit-run
Expected output:
Step 1/10: Security - Detecting secrets and sensitive data...
๐ Checking for hardcoded secrets with Gitleaks (staged files only - FAST)...
โน๏ธ Scanning staged files only (~1 second)...
โ
No secrets in staged files
CI/CD Integration¶
GitHub Actions Workflow¶
File: .github/workflows/02-security.yml
Triggers: - Every pull request to main - Every push to main - Manual workflow dispatch
What happens: 1. Gitleaks: Scans entire git history for secrets 2. TruffleHog: Scans for verified secrets (--only-verified) 3. Result: FAIL = PR blocked, PASS = merge allowed
Important: CI now fails on ANY secret detection (no continue-on-error)
Viewing CI Results¶
# Check PR status
gh pr checks <pr-number>
# View security scan logs
gh run view <run-id> --job "๐ Gitleaks Secret Scanning"
False Positive Handling¶
Legitimate Secrets in Test Fixtures¶
Problem: Test files use fake API keys that trigger detection
Solution: Update .secrets.baseline
# Generate updated baseline
detect-secrets scan --baseline .secrets.baseline
# Audit and mark false positives
detect-secrets audit .secrets.baseline
# Navigate with arrow keys, press:
# - 'y' = Real secret (will block commit)
# - 'n' = False positive (allow)
# - 's' = Skip for now
# Commit the updated baseline
git add .secrets.baseline
git commit -m "chore: update secrets baseline"
Allowlisting Paths¶
Problem: Documentation contains example secrets (README.md, env.example)
Solution: Add to .gitleaks.toml allowlist
[allowlist]
paths = [
'''env\.example''', # Example env files
'''docs/.*\.md''', # Documentation
'''tests/fixtures/.*''', # Test fixtures
'''deployment/k8s/.*/secrets/.*''', # K8s secret templates
]
Pattern syntax: Go regex
Adding New Secret Patterns¶
When to Add a Pattern¶
- New LLM provider integration (e.g., Cohere, HuggingFace)
- New cloud provider (e.g., DigitalOcean, Linode)
- New internal service with API keys
How to Add a Pattern¶
1. Define regex pattern in .gitleaks.toml:
[[rules]]
id = "cohere-api-key"
description = "Cohere API Key"
regex = '''[a-zA-Z0-9]{40}'''
keywords = ["cohere", "COHERE_API_KEY"]
tags = ["key", "Cohere"]
2. Test the pattern:
# Create test file with fake secret
echo "COHERE_API_KEY=abc123..." > test_secret.txt
# Run Gitleaks
gitleaks detect --source . --config .gitleaks.toml
# Expected: Should detect the pattern
3. Add to env.example:
4. Update this documentation (Supported Secret Types section)
Git History Cleanup¶
If a Secret Was Committed¶
โ ๏ธ WARNING: This rewrites git history. Coordinate with your team.
Method 1: BFG Repo-Cleaner (Recommended)¶
# Install BFG
brew install bfg
# Clone a fresh copy
git clone --mirror https://github.com/your-org/rag_modulo.git
cd rag_modulo.git
# Replace secret in all history
echo "sk-actual_secret_key_here" > ../secrets.txt
bfg --replace-text ../secrets.txt
# Verify and force push
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force
# Everyone must re-clone
Method 2: git-filter-repo (Fine-grained)¶
# Install git-filter-repo
pip install git-filter-repo
# Remove file from history
git filter-repo --path .env --invert-paths
# Force push
git push origin --force --all
Method 3: GitHub Secret Scanning Remediation¶
- GitHub automatically detects secrets in public repos
- Navigate to Settings โ Security โ Secret scanning
- Review alerts and rotate secrets
- Follow GitHub's guided remediation
Emergency Response Playbook¶
Secret Detected in CI¶
Time-sensitive! Follow this exact order:
1. Rotate the secret IMMEDIATELY (< 5 minutes)
# OpenAI
https://platform.openai.com/api-keys โ Revoke โ Create new
# WatsonX
IBM Cloud Console โ API Keys โ Delete โ Create new
# GitHub
Settings โ Developer settings โ Tokens โ Delete โ Generate new
2. Update local .env with new secret
3. Update CI/CD secrets (if using GitHub Secrets)
# Via GitHub UI
Settings โ Secrets and variables โ Actions โ Update secret
# Or via CLI
gh secret set OPENAI_API_KEY < secret.txt
4. Clean git history (see Git History Cleanup)
5. Verify rotation
# Test old secret doesn't work
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer sk-old_key" \
# Expected: 401 Unauthorized
# Test new secret works
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer sk-new_key" \
# Expected: 200 OK
6. Document incident (internal security log)
GitHub Secret Scanning (Optional)¶
Enabling GitHub Secret Scanning¶
Requirements: - GitHub Advanced Security (free for public repos, paid for private) - Repository admin access
Steps: 1. Navigate to repository Settings 2. Security โ Code security and analysis 3. Enable "Secret scanning" 4. Enable "Push protection" (recommended)
Custom patterns: 1. Settings โ Security โ Secret scanning 2. Custom patterns โ New pattern 3. Example:
Alerts: - Automatic detection of leaked secrets - Email notifications to admins - Integration with .gitleaks.toml patterns
Best Practices¶
โ Do's¶
- โ Use environment variables for all secrets
- โ
Keep
.envin.gitignore(already configured) - โ
Reference secrets in
env.examplewith placeholders - โ
Run
make pre-commit-runbefore pushing - โ Rotate secrets every 90 days
- โ Use different secrets for dev/staging/prod
- โ Document secret sources in team wiki
โ Don'ts¶
- โ Hardcode secrets in Python/JavaScript files
- โ Commit
.envfiles to git - โ Share secrets via Slack/email
- โ Use production secrets in development
- โ Bypass pre-commit hooks with
--no-verify(unless emergency) - โ Store secrets in comments or documentation
- โ Use weak secrets like "password123"
Troubleshooting¶
Pre-commit hook says "command not found: gitleaks"¶
Solution:
# macOS
brew install gitleaks
# Linux
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.0/gitleaks_8.18.0_linux_x64.tar.gz
tar -xzf gitleaks_8.18.0_linux_x64.tar.gz
sudo mv gitleaks /usr/local/bin/
detect-secrets keeps flagging false positives¶
Solution: Use --baseline to track known false positives
# Update baseline
detect-secrets scan --baseline .secrets.baseline
# Audit (mark false positives)
detect-secrets audit .secrets.baseline
CI failed but I can't see the secret¶
Reason: Secrets are redacted in CI logs for security
Solution: Run locally with verbose output
Need to bypass pre-commit hook temporarily¶
Emergency only:
Related Documentation¶
- Production Secrets: Security Hardening Guide
- CI/CD Security: CI/CD Security Pipeline
- Environment Setup: Environment Configuration
Reference¶
Tools¶
- detect-secrets - Baseline-based detection
- Gitleaks - Pattern-based scanning
- TruffleHog - Entropy + verification
- BFG Repo-Cleaner - Git history cleanup
Configuration Files¶
.gitleaks.toml- Gitleaks patterns and allowlists.secrets.baseline- detect-secrets false positive tracking.pre-commit-config.yaml- Pre-commit hook configuration.github/workflows/02-security.yml- CI/CD secret scanning
Support¶
- Security issues: Report to maintainers via private channel
- False positives: Create PR updating
.secrets.baseline - New patterns: Create PR updating
.gitleaks.toml
Last updated: October 2025 Maintainer: Security Team