Demystify Retrieval-Augmented Generation (RAG) by building it yourself - step by step.
No black boxes. No cloud APIs. Just clear explanations, simple examples, and local code you fully understand.
This project follows the same philosophy as AI Agents from Scratch:
make advanced AI concepts approachable for developers through minimal, well-explained, real code.
- What RAG really is, and why it’s so powerful for knowledge retrieval.
- How embeddings work, turn text into numbers your model can understand.
- How to build a local vector database, store and query documents efficiently.
- How to connect everything, retrieve context and feed it into an LLM for grounded answers.
- How to re-rank and normalize, improving retrieval precision and reducing noise.
- Query rewriting, normalize and rewrite user queries (heuristic and LLM) before retrieval.
- Step-by-step code walkthroughs, every function explained, nothing hidden.
Retrieval-Augmented Generation (RAG) enhances language models by giving them access to external knowledge.
Instead of asking the model to “remember” everything, you let it retrieve relevant context before generating a response.
Pipeline:
- Knowledge Requirements, define questions and data needs.
- Data Loading, import and structure your documents.
- Text Splitting & Chunking, divide data into manageable pieces.
- Embedding, turn chunks into numerical vectors.
- Vector Store, save and index embeddings for fast retrieval.
- Retrieval, fetch the most relevant context for a given query.
- Post-Retrieval Re-Ranking, re-order results to prioritize the best context.
- Query Preprocessing & Embedding Normalization, clean and standardize input vectors for consistency.
- Augmentation, merge retrieved context into the model’s prompt.
- Generation, produce grounded answers using a local LLM.
Follow these examples in order to build understanding progressively:
examples/00_how_rag_works/
Code | Code Explanation | Concepts
What you'll learn:
- The core idea behind Retrieval-Augmented Generation
- How retrieval and generation work together
- A minimal, simplified end-to-end RAG flow in under 70 lines of code
Key concepts: retrieval, generation, context injection, similarity search
examples/02_data_loading/
Code | Code Explanation | Concepts
What you'll learn:
- Loading raw text data
- Normalizing and preparing documents
Key concepts: file I/O, preprocessing, document structure
examples/03_text_splitting_and_chunking/
Code | Code Explanation | Concepts
What you'll learn:
- How to split long text into manageable chunks
- Overlaps, boundaries, and chunk strategies
Key concepts: chunking logic, context windows, granularity trade-offs
examples/04_intro_to_embeddings/02_generate_embeddings/
Code | Code Explanation | Concepts
What you'll learn:
- How embeddings represent meaning as vectors
- How to generate embeddings locally
Key concepts: vector representation, similarity, embedding models
examples/05_building_vector_store/01_in_memory_store/
Code | Code Explanation | Concepts
What you'll learn:
- How to store embeddings
- How nearest-neighbor search works
Key concepts: indexing, vector search, metadata storage
examples/06_retrieval_strategies/01_basic_retrieval/
Code | Code Explanation | Concepts
What you'll learn:
- Retrieving relevant chunks from the vector store
- Understanding similarity scoring
Key concepts: augment, scoring, top-k retrieval
At the end you can look into Showcase to see everything you learned so far in action.
examples/06_retrieval_strategies/02_query_preprocessing/
Code | Code Explanation | Concepts
What you'll learn:
- Cleaning and normalizing user queries before embedding
- Reducing noise and improving embedding consistency
Key concepts: normalization, stopword removal, query cleaning, vector stability
examples/06_retrieval_strategies/03_hybrid_search/
Code | Code Explanation | Concepts
What you'll learn:
- Combining multiple retrieval strategies (e.g., vector + keyword)
- Balancing semantic similarity with traditional search signals
Key concepts: hybrid scoring, weighted search, BM25 + embeddings, multi-strategy retrieval
examples/06_retrieval_strategies/04_multi_query_retrieval/
Code | Code Explanation | Concepts
What you'll learn:
- Decomposing complex queries into sub-queries (e.g. with an LLM)
- Running multiple queries in parallel and fusing results (RRF, weighted)
- Query expansion, perspective-based retrieval, and adaptive strategy selection
- Deduplication and ranking when combining result lists
Key concepts: query decomposition, parallel retrieval, reciprocal rank fusion (RRF), weighted fusion, deduplication
examples/06_retrieval_strategies/05_query_rewriting/
Code | Code Explanation | Concepts
What you'll learn:
- Normalizing and cleaning user queries before retrieval
- Stripping filler, expanding acronyms, and stripping injection-like content
- Intent classification and optional LLM rewrite (Qwen 3-1 via node-llama-cpp) with heuristic fallback
Key concepts: query normalization, intent classification, heuristic rewrite, LLM rewrite, alternate queries
├── src/ # Reusable library code
│ ├── embeddings/
│ │ ├── index.js # Main exports
│ │ ├── EmbeddingModel.js # Model wrapper class
│ │ └── EmbeddingCache.js # Caching layer
│ │
│ ├── vector-stores/
│ │ ├── index.js # Main exports
│ │ ├── BaseVectorStore.js # Abstract base class
│ │ ├── InMemoryVectorStore.js # In-memory implementation
│ │ ├── LanceDBVectorStore.js # LanceDB implementation
│ │ └── QdrantVectorStore.js # Qdrant implementation
│ │
│ ├── loaders/
│ │ ├── index.js
│ │ ├── BaseLoader.js # Abstract loader
│ │ ├── PDFLoader.js # PDF loading
│ │ ├── TextLoader.js # Text file loading
│ │ └── DirectoryLoader.js # Batch loading
│ │
│ ├── text-splitters/
│ │ ├── index.js
│ │ ├── BaseTextSplitter.js # Base class
│ │ ├── CharacterTextSplitter.js
│ │ ├── RecursiveCharacterTextSplitter.js
│ │ └── TokenTextSplitter.js
│ │
│ ├── retrievers/
│ │ ├── index.js
│ │ ├── BaseRetriever.js # Base retriever
│ │ ├── VectorStoreRetriever.js # Vector search
│ │ ├── RerankerRetriever.js # With reranking
│ │ └── HybridRetriever.js # Multiple strategies
│ │
│ ├── chains/
│ │ ├── index.js
│ │ ├── RetrievalChain.js # Query → Retrieve → Format
│ │ ├── RAGChain.js # Full RAG pipeline
│ │ └── ConversationalChain.js # With memory
│ │
│ ├── prompts/
│ │ ├── index.js
│ │ ├── PromptTemplate.js # Template class
│ │ └── templates/
│ │ ├── qa.js # Q&A templates
│ │ ├── summarization.js
│ │ └── conversation.js
│ │
│ ├── utils/
│ │ ├── index.js
│ │ ├── Document.js # Document class
│ │ ├── similarity.js # Similarity functions
│ │ ├── tokenizer.js # Token counting
│ │ └── validators.js # Input validation
│ │
│ └── index.js # Main library export
│
├── examples/
│ ├── 00_how_rag_works/
│ │ └── example.js # Minimal RAG simulation with naive keyword search
│ │
│ ├── 01_intro_to_llms/
│ │ └── example.js # Introduction to LLMs, the brain of your RAG system
│ │
│ ├── 02_data_loading/
│ │ └── example.js # Load and preprocess raw text data
│ │
│ ├── 03_text_splitting_and_chunking/
│ │ └── example.js # Split long text into chunks for embedding
│ │
│ ├── 04_intro_to_embeddings/
│ │ ├── 01_text_similarity_basics/
│ │ └── 02_generate_embeddings/
│ │
│ ├── 05_building_vector_store/
│ │ ├── 01_in_memory_store/
│ │ ├── 02_nearest_neighbor_search/
│ │ └── 03_metadata_filtering/
│ │
│ ├── 06_retrieval_strategies/
│ │ ├── 01_basic_retrieval/ # Top-k retrieval, similarity scoring; includes showcase.js
│ │ ├── 02_query_preprocessing/ # Normalize and clean queries before embedding
│ │ ├── 03_hybrid_search/ # Vector + keyword (e.g. BM25) combined
│ │ ├── 04_multi_query_retrieval/ # Decomposition (LLM), parallel retrieval, RRF, dedup (config, helpers)
│ │ ├── 05_query_rewriting/ # Heuristic + LLM rewrite (Qwen), intent, alternates (config, query-rewriter)
│ │ ├── 06_rank_results/ # (planned) Score normalization, ranking methods
│ │ └── 07_post_retrieval_reranking/ # (planned) Rerank retrieved results for precision
│ │
│ ├── 07_prompt_engineering_for_rag/ # (planned) Context stuffing, citations, compression
│ │ ├── 01_context_stuffing/
│ │ ├── 02_citation_prompts/
│ │ └── 03_context_compression/
│ │
│ ├── 08_rag_in_action/ # (planned) Full pipeline, error handling, streaming
│ │ ├── 01_basic_rag/
│ │ ├── 02_error_handling/
│ │ └── 03_streaming_responses/
│ │
│ ├── 09_evaluating_rag_quality/ # (planned) Retrieval and generation metrics
│ │ ├── 01_retrieval_metrics/
│ │ ├── 02_generation_metrics/
│ │ └── 03_end_to_end_evaluation/
│ │
│ ├── 10_observability_and_caching/ # (planned) Cache repeated queries, log performance
│ │ └── example.js
│ │
│ ├── 11_metadata_and_structured_data/ # (planned) Metadata and structured data handling
│ │ └── example.js
│ │
│ ├── 12_graph_db_integration/ # (planned) Graph database (e.g. kuzu) for retrieval
│ │ └── example.js
│ │
│ ├── 13_knowledge_requiremens/ # (planned) Define knowledge needs and sources
│ │ └── example.js
│ │
│ ├── tutorials/ # (planned) Higher-level guides
│ │ ├── basic-rag-pipeline.js
│ │ ├── conversational-rag.js
│ │ ├── multi-modal-rag.js
│ │ └── advanced-retrieval.js
│ │
│ ├── templates/ # (planned) Starter templates
│ │ ├── simple-rag/
│ │ ├── api-server/
│ │ └── chatbot/
│ │
│ ├── tests/ # Unit tests
│ │ ├── embeddings/
│ │ ├── vector-stores/
│ │ └── ...
│ │
│ └── README.md
Retrieval strategies 01–05 are implemented; 06_rank_results and 07_post_retrieval_reranking are planned (see What's Coming Next). Implemented examples often include config.js, CODE.md, and CONCEPT.md alongside example.js.
| Goal | What You Add | Why It Helps |
|---|---|---|
| Concept clarity | 00_how_rag_works |
See retrieval + generation in <70 lines before touching vectors. |
| Mathematical intuition | 04_intro_to_embeddings/01_text_similarity_basics.js |
Learn cosine similarity without black-box APIs. |
| Hands-on understanding | 05_building_vector_store/01_in_memory_store.js |
Understand how embeddings are stored and compared. |
| Better results | 06_retrieval_strategies/07_post_retrieval_reranking.js |
Reduce noise and redundancy in retrieved context. |
| Query quality | 06_retrieval_strategies/02_query_preprocessing.js |
Ensure embeddings represent consistent meaning. |
| Knowledge connectivity | 12_graph_db_integration/example.js |
Explore how a graph database can improve retrieval and reasoning. |
Each folder contains:
- A minimal example (
example.js) - A detailed explanation of every step
- Comments in the code to teach the concept clearly
This project is being built step by step, following an educational approach where each concept is introduced incrementally.
The following core components and examples are currently available:
Examples and tutorials:
00_how_rag_works- Minimal RAG simulation to understand the concept01_intro_to_llms- Getting started with local LLMs (node-llama-cpp basics, building LLM wrapper)02_data_loading- Loading and preprocessing raw text data03_text_splitting_and_chunking- Splitting long text into manageable chunks04_intro_to_embeddings- Text similarity basics and generating embeddings05_building_vector_store- In-memory store, nearest neighbor search, metadata filtering06_retrieval_strategies/01_basic_retrieval- Basic retrieval and similarity scoring06_retrieval_strategies/02_query_preprocessing- Query normalization and cleaning before retrieval06_retrieval_strategies/03_hybrid_search- Combining vector and keyword (e.g. BM25) search06_retrieval_strategies/04_multi_query_retrieval- Query decomposition (LLM), parallel retrieval, RRF and weighted fusion, deduplication06_retrieval_strategies/05_query_rewriting- Normalization, heuristic and LLM rewrite (Qwen 3-1 via node-llama-cpp), intent classification
Library: Loaders, text splitters, embeddings, vector stores, retrievers, chains, prompts (see Project Structure).
The following topics will be added step by step in the coming weeks and months:
Retrieval strategies:
- Result ranking and scoring
- Post-retrieval reranking
Prompt engineering for RAG:
- Context stuffing techniques
- Citation and source attribution prompts
- Context compression
RAG in production:
- Error handling and fallbacks
- Streaming responses
- End-to-end RAG pipeline examples
Evaluation and optimization:
- Retrieval metrics (precision, recall, MRR)
- Generation quality metrics
- End-to-end evaluation frameworks
Advanced features:
- Observability and performance monitoring
- Caching strategies for repeated queries
- Metadata and structured data handling
- Graph database integration (e.g. kuzu)
- Multi-modal RAG
Templates and guides:
- Complete starter templates (simple RAG, API server, chatbot)
- Higher-level tutorials and best practices
Note: This is an educational project focused on building understanding from the ground up. Each new topic will be introduced with clear explanations, minimal examples, and thoroughly commented code. The goal is not to rush through features, but to ensure every concept is deeply understood before moving to the next.
- Node.js 18+
- Local LLM (e.g.,
node-llama-cpp) - npm packages for embeddings, vector math, and optional
kuzu
Install dependencies:
npm install
node 00_how_rag_works/example.jsThis repository is not about fancy frameworks or huge models.
It’s about understanding, line by line, how RAG works under the hood.
If you can explain it, you can build it.
If you can build it, you can improve it.
Contributions are welcome!
If you have a clear, educational RAG example, open a PR.
