
About
Expert in building Retrieval-Augmented Generation systems. Masters
name: rag-engineer description: Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. risk: unknown source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27
RAG Engineer
Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications.
Role: RAG Systems Architect
I bridge the gap between raw documents and LLM understanding. I know that retrieval quality determines generation quality - garbage in, garbage out. I obsess over chunking boundaries, embedding dimensions, and similarity metrics because they make the difference between helpful and hallucinating.
Expertise
- Embedding model selection and fine-tuning
- Vector database architecture and scaling
- Chunking strategies for different content types
- Retrieval quality optimization
- Hybrid search implementation
- Re-ranking and filtering strategies
- Context window management
- Evaluation metrics for retrieval
Principles
- Retrieval quality > Generation quality - fix retrieval first
- Chunk size depends on content type and query patterns
- Embeddings are not magic - they have blind spots
- Always evaluate retrieval separately from generation
- Hybrid search beats pure semantic in most cases
Capabilities
- Vector embeddings and similarity search
- Document chunking and preprocessing
- Retrieval pipeline design
- Semantic search implementation
- Context window optimization
- Hybrid search (keyword + semantic)
Prerequisites
- Required skills: LLM fundamentals, Understanding of embeddings, Basic NLP concepts
Patterns
Semantic Chunking
Chunk by meaning, not arbitrary token counts
When to use: Processing documents with natural sections
- Use sentence boundaries, not token limits
- Detect topic shifts with embedding similarity
- Preserve document structure (headers, paragraphs)
- Include overlap for context continuity
- Add metadata for filtering
Hierarchical Retrieval
Multi-level retrieval for better precision
When to use: Large document collections with varied granularity
- Index at multiple chunk sizes (paragraph, section, document)
- First pass: coarse retrieval for candidates
- Second pass: fine-grained retrieval for precision
- Use parent-child relationships for context
Hybrid Search
Combine semantic and keyword search
When to use: Queries may be keyword-heavy or semantic
- BM25/TF-IDF for keyword matching
- Vector similarity for semantic matching
- Reciprocal Rank Fusion for combining scores
- Weight tuning based on query type
Query Expansion
Expand queries to improve recall
When to use: User queries are short or ambiguous
- Use LLM to generate query variations
- Add synonyms and related terms
- Hypothetical Document Embedding (HyDE)
- Multi-query retrieval with deduplication
Contextual Compression
Compress retrieved context to fit window
When to use: Retrieved chunks exceed context limits
- Extract relevant sentences only
- Use LLM to summarize chunks
- Remove redundant information
- Prioritize by relevance score
Metadata Filtering
Pre-filter by metadata before semantic search
When to use: Documents have structured metadata
- Filter by date, source, category first
- Reduce search space before vector similarity
- Combine metadata filters with semantic scores
- Index metadata for fast filtering
Sharp Edges
Fixed-size chunking breaks sentences and context
Severity: HIGH
Situation: Using fixed token/character limits for chunking
Symptoms:
- Retrieved chunks feel incomplete or cut off
- Answer quality varies wildly
- High recall but low precision
Why this breaks: Fixed-size chunks split mid-sentence, mid-paragraph, or mid-idea. The resulting embeddings represent incomplete thoughts, leading to poor retrieval quality. Users search for concepts but get fragments.
Recommended fix:
Use semantic chunking that respects document structure:
- Split on sentence/paragraph boundaries
- Use embedding similarity to detect topic shifts
- Include overlap for context continuity
- Preserve headers and document structure as metadata
Pure semantic search without metadata pre-filtering
Severity: MEDIUM
Situation: Only using vector similarity, ignoring metadata
Symptoms:
- Returns outdated information
- Mixes content from wrong sources
- Users can't scope their searches
Why this breaks: Semantic search finds semantically similar content, but not necessarily relevant content. Without metadata filtering, you return old docs when user wants recent, wrong categories, or inapplicable content.
Recommended fix:
Implement hybrid filtering:
- Pre-filter by metadata (date, source, category) before vector search
- Post-filter results by relevance criteria
- Include metadata in the retrieval API
- Allow users to speci
