RAG from Scratch

Demystify Retrieval-Augmented Generation (RAG) by building it yourself - step by step.
No black boxes. No cloud APIs. Just clear explanations, simple examples, and local code you fully understand.

This project follows the same philosophy as AI Agents from Scratch:
make advanced AI concepts approachable for developers through minimal, well-explained, real code.

What You'll Learn

What RAG really is, and why it’s so powerful for knowledge retrieval.
How embeddings work, turn text into numbers your model can understand.
How to build a local vector database, store and query documents efficiently.
How to connect everything, retrieve context and feed it into an LLM for grounded answers.
How to re-rank and normalize, improving retrieval precision and reducing noise.
Query rewriting, normalize and rewrite user queries (heuristic and LLM) before retrieval.
Step-by-step code walkthroughs, every function explained, nothing hidden.

Concept Overview

Retrieval-Augmented Generation (RAG) enhances language models by giving them access to external knowledge.
Instead of asking the model to “remember” everything, you let it retrieve relevant context before generating a response.

Pipeline:

Knowledge Requirements, define questions and data needs.
Data Loading, import and structure your documents.
Text Splitting & Chunking, divide data into manageable pieces.
Embedding, turn chunks into numerical vectors.
Vector Store, save and index embeddings for fast retrieval.
Retrieval, fetch the most relevant context for a given query.
Post-Retrieval Re-Ranking, re-order results to prioritize the best context.
Query Preprocessing & Embedding Normalization, clean and standardize input vectors for consistency.
Augmentation, merge retrieved context into the model’s prompt.
Generation, produce grounded answers using a local LLM.

Learning Path

Follow these examples in order to build understanding progressively:

0. How RAG Works

examples/00_how_rag_works/
Code | Code Explanation | Concepts

What you'll learn:

The core idea behind Retrieval-Augmented Generation
How retrieval and generation work together
A minimal, simplified end-to-end RAG flow in under 70 lines of code

Key concepts: retrieval, generation, context injection, similarity search

1. Data Loading

examples/02_data_loading/
Code | Code Explanation | Concepts

What you'll learn:

Loading raw text data
Normalizing and preparing documents

Key concepts: file I/O, preprocessing, document structure

2. Text Splitting & Chunking

examples/03_text_splitting_and_chunking/
Code | Code Explanation | Concepts

What you'll learn:

How to split long text into manageable chunks
Overlaps, boundaries, and chunk strategies

Key concepts: chunking logic, context windows, granularity trade-offs

3. Embedding

examples/04_intro_to_embeddings/02_generate_embeddings/
Code | Code Explanation | Concepts

What you'll learn:

How embeddings represent meaning as vectors
How to generate embeddings locally

Key concepts: vector representation, similarity, embedding models

4. Vector Store

examples/05_building_vector_store/01_in_memory_store/
Code | Code Explanation | Concepts

What you'll learn:

How to store embeddings
How nearest-neighbor search works

Key concepts: indexing, vector search, metadata storage

5. Basic Retrieval

examples/06_retrieval_strategies/01_basic_retrieval/
Code | Code Explanation | Concepts

What you'll learn:

Retrieving relevant chunks from the vector store
Understanding similarity scoring

Key concepts: augment, scoring, top-k retrieval

At the end you can look into Showcase to see everything you learned so far in action.

7. Query Preprocessing

examples/06_retrieval_strategies/02_query_preprocessing/
Code | Code Explanation | Concepts

What you'll learn:

Cleaning and normalizing user queries before embedding
Reducing noise and improving embedding consistency

Key concepts: normalization, stopword removal, query cleaning, vector stability

8. Hybrid Search

examples/06_retrieval_strategies/03_hybrid_search/
Code | Code Explanation | Concepts

What you'll learn:

Combining multiple retrieval strategies (e.g., vector + keyword)
Balancing semantic similarity with traditional search signals

Key concepts: hybrid scoring, weighted search, BM25 + embeddings, multi-strategy retrieval

9. Multi-Query Retrieval

examples/06_retrieval_strategies/04_multi_query_retrieval/
Code | Code Explanation | Concepts

What you'll learn:

Decomposing complex queries into sub-queries (e.g. with an LLM)
Running multiple queries in parallel and fusing results (RRF, weighted)
Query expansion, perspective-based retrieval, and adaptive strategy selection
Deduplication and ranking when combining result lists

Key concepts: query decomposition, parallel retrieval, reciprocal rank fusion (RRF), weighted fusion, deduplication

10. Query Rewriting

examples/06_retrieval_strategies/05_query_rewriting/
Code | Code Explanation | Concepts

What you'll learn:

Normalizing and cleaning user queries before retrieval
Stripping filler, expanding acronyms, and stripping injection-like content
Intent classification and optional LLM rewrite (Qwen 3-1 via node-llama-cpp) with heuristic fallback

Key concepts: query normalization, intent classification, heuristic rewrite, LLM rewrite, alternate queries

Project Structure

├── src/                                    # Reusable library code
│   ├── embeddings/
│   │   ├── index.js                        # Main exports
│   │   ├── EmbeddingModel.js               # Model wrapper class
│   │   └── EmbeddingCache.js               # Caching layer
│   │
│   ├── vector-stores/
│   │   ├── index.js                        # Main exports
│   │   ├── BaseVectorStore.js              # Abstract base class
│   │   ├── InMemoryVectorStore.js          # In-memory implementation
│   │   ├── LanceDBVectorStore.js           # LanceDB implementation
│   │   └── QdrantVectorStore.js            # Qdrant implementation
│   │
│   ├── loaders/
│   │   ├── index.js
│   │   ├── BaseLoader.js                   # Abstract loader
│   │   ├── PDFLoader.js                    # PDF loading
│   │   ├── TextLoader.js                   # Text file loading
│   │   └── DirectoryLoader.js              # Batch loading
│   │
│   ├── text-splitters/
│   │   ├── index.js
│   │   ├── BaseTextSplitter.js             # Base class
│   │   ├── CharacterTextSplitter.js        
│   │   ├── RecursiveCharacterTextSplitter.js
│   │   └── TokenTextSplitter.js
│   │
│   ├── retrievers/
│   │   ├── index.js
│   │   ├── BaseRetriever.js                # Base retriever
│   │   ├── VectorStoreRetriever.js         # Vector search
│   │   ├── RerankerRetriever.js            # With reranking
│   │   └── HybridRetriever.js              # Multiple strategies
│   │
│   ├── chains/
│   │   ├── index.js
│   │   ├── RetrievalChain.js               # Query → Retrieve → Format
│   │   ├── RAGChain.js                     # Full RAG pipeline
│   │   └── ConversationalChain.js          # With memory
│   │
│   ├── prompts/
│   │   ├── index.js
│   │   ├── PromptTemplate.js               # Template class
│   │   └── templates/
│   │       ├── qa.js                       # Q&A templates
│   │       ├── summarization.js
│   │       └── conversation.js
│   │
│   ├── utils/
│   │   ├── index.js
│   │   ├── Document.js                     # Document class
│   │   ├── similarity.js                   # Similarity functions
│   │   ├── tokenizer.js                    # Token counting
│   │   └── validators.js                   # Input validation
│   │
│   └── index.js                            # Main library export
│
├── examples/
│   ├── 00_how_rag_works/
│   │   └── example.js                      # Minimal RAG simulation with naive keyword search
│   │
│   ├── 01_intro_to_llms/
│   │   └── example.js                      # Introduction to LLMs, the brain of your RAG system
│   │
│   ├── 02_data_loading/
│   │   └── example.js                      # Load and preprocess raw text data
│   │
│   ├── 03_text_splitting_and_chunking/
│   │   └── example.js                      # Split long text into chunks for embedding
│   │
│   ├── 04_intro_to_embeddings/
│   │   ├── 01_text_similarity_basics/
│   │   └── 02_generate_embeddings/
│   │
│   ├── 05_building_vector_store/
│   │   ├── 01_in_memory_store/
│   │   ├── 02_nearest_neighbor_search/
│   │   └── 03_metadata_filtering/
│   │
│   ├── 06_retrieval_strategies/
│   │   ├── 01_basic_retrieval/             # Top-k retrieval, similarity scoring; includes showcase.js
│   │   ├── 02_query_preprocessing/         # Normalize and clean queries before embedding
│   │   ├── 03_hybrid_search/               # Vector + keyword (e.g. BM25) combined
│   │   ├── 04_multi_query_retrieval/       # Decomposition (LLM), parallel retrieval, RRF, dedup (config, helpers)
│   │   ├── 05_query_rewriting/             # Heuristic + LLM rewrite (Qwen), intent, alternates (config, query-rewriter)
│   │   ├── 06_rank_results/                # (planned) Score normalization, ranking methods
│   │   └── 07_post_retrieval_reranking/    # (planned) Rerank retrieved results for precision
│   │
│   ├── 07_prompt_engineering_for_rag/     # (planned) Context stuffing, citations, compression
│   │   ├── 01_context_stuffing/
│   │   ├── 02_citation_prompts/
│   │   └── 03_context_compression/
│   │
│   ├── 08_rag_in_action/                   # (planned) Full pipeline, error handling, streaming
│   │   ├── 01_basic_rag/
│   │   ├── 02_error_handling/
│   │   └── 03_streaming_responses/
│   │
│   ├── 09_evaluating_rag_quality/          # (planned) Retrieval and generation metrics
│   │   ├── 01_retrieval_metrics/
│   │   ├── 02_generation_metrics/
│   │   └── 03_end_to_end_evaluation/
│   │
│   ├── 10_observability_and_caching/       # (planned) Cache repeated queries, log performance
│   │   └── example.js
│   │
│   ├── 11_metadata_and_structured_data/    # (planned) Metadata and structured data handling
│   │   └── example.js
│   │
│   ├── 12_graph_db_integration/           # (planned) Graph database (e.g. kuzu) for retrieval
│   │   └── example.js
│   │
│   ├── 13_knowledge_requiremens/          # (planned) Define knowledge needs and sources
│   │   └── example.js
│   │
│   ├── tutorials/                          # (planned) Higher-level guides
│   │   ├── basic-rag-pipeline.js
│   │   ├── conversational-rag.js
│   │   ├── multi-modal-rag.js
│   │   └── advanced-retrieval.js
│   │
│   ├── templates/                          # (planned) Starter templates
│   │   ├── simple-rag/
│   │   ├── api-server/
│   │   └── chatbot/
│   │
│   ├── tests/                              # Unit tests
│   │   ├── embeddings/
│   │   ├── vector-stores/
│   │   └── ...
│   │
│   └── README.md

Retrieval strategies 01–05 are implemented; 06_rank_results and 07_post_retrieval_reranking are planned (see What's Coming Next). Implemented examples often include config.js, CODE.md, and CONCEPT.md alongside example.js.

How it works

Goal	What You Add	Why It Helps
Concept clarity	`00_how_rag_works`	See retrieval + generation in <70 lines before touching vectors.
Mathematical intuition	`04_intro_to_embeddings/01_text_similarity_basics.js`	Learn cosine similarity without black-box APIs.
Hands-on understanding	`05_building_vector_store/01_in_memory_store.js`	Understand how embeddings are stored and compared.
Better results	`06_retrieval_strategies/07_post_retrieval_reranking.js`	Reduce noise and redundancy in retrieved context.
Query quality	`06_retrieval_strategies/02_query_preprocessing.js`	Ensure embeddings represent consistent meaning.
Knowledge connectivity	`12_graph_db_integration/example.js`	Explore how a graph database can improve retrieval and reasoning.

Each folder contains:

A minimal example (example.js)
A detailed explanation of every step
Comments in the code to teach the concept clearly

Current Implementation Status

This project is being built step by step, following an educational approach where each concept is introduced incrementally.

What's Implemented

The following core components and examples are currently available:

Examples and tutorials:

00_how_rag_works - Minimal RAG simulation to understand the concept
01_intro_to_llms - Getting started with local LLMs (node-llama-cpp basics, building LLM wrapper)
02_data_loading - Loading and preprocessing raw text data
03_text_splitting_and_chunking - Splitting long text into manageable chunks
04_intro_to_embeddings - Text similarity basics and generating embeddings
05_building_vector_store - In-memory store, nearest neighbor search, metadata filtering
06_retrieval_strategies/01_basic_retrieval - Basic retrieval and similarity scoring
06_retrieval_strategies/02_query_preprocessing - Query normalization and cleaning before retrieval
06_retrieval_strategies/03_hybrid_search - Combining vector and keyword (e.g. BM25) search
06_retrieval_strategies/04_multi_query_retrieval - Query decomposition (LLM), parallel retrieval, RRF and weighted fusion, deduplication
06_retrieval_strategies/05_query_rewriting - Normalization, heuristic and LLM rewrite (Qwen 3-1 via node-llama-cpp), intent classification

Library: Loaders, text splitters, embeddings, vector stores, retrievers, chains, prompts (see Project Structure).

What's Coming Next

The following topics will be added step by step in the coming weeks and months:

Retrieval strategies:

Result ranking and scoring
Post-retrieval reranking

Prompt engineering for RAG:

Context stuffing techniques
Citation and source attribution prompts
Context compression

RAG in production:

Error handling and fallbacks
Streaming responses
End-to-end RAG pipeline examples

Evaluation and optimization:

Retrieval metrics (precision, recall, MRR)
Generation quality metrics
End-to-end evaluation frameworks

Advanced features:

Observability and performance monitoring
Caching strategies for repeated queries
Metadata and structured data handling
Graph database integration (e.g. kuzu)
Multi-modal RAG

Templates and guides:

Complete starter templates (simple RAG, API server, chatbot)
Higher-level tutorials and best practices

Note: This is an educational project focused on building understanding from the ground up. Each new topic will be introduced with clear explanations, minimal examples, and thoroughly commented code. The goal is not to rush through features, but to ensure every concept is deeply understood before moving to the next.

Requirements

Node.js 18+
Local LLM (e.g., node-llama-cpp)
npm packages for embeddings, vector math, and optional kuzu

Install dependencies:

npm install
node 00_how_rag_works/example.js

Philosophy

This repository is not about fancy frameworks or huge models.
It’s about understanding, line by line, how RAG works under the hood.

If you can explain it, you can build it.
If you can build it, you can improve it.

Contribute

Contributions are welcome!
If you have a clear, educational RAG example, open a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.husky		.husky
examples		examples
helpers		helpers
images		images
models		models
src		src
.env_example		.env_example
.gitignore		.gitignore
DOWNLOAD.md		DOWNLOAD.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG from Scratch

What You'll Learn

Concept Overview

Learning Path

0. How RAG Works

1. Data Loading

2. Text Splitting & Chunking

3. Embedding

4. Vector Store

5. Basic Retrieval

7. Query Preprocessing

8. Hybrid Search

9. Multi-Query Retrieval

10. Query Rewriting

Project Structure

How it works

Current Implementation Status

What's Implemented

What's Coming Next

Requirements

Philosophy

Contribute

See Also

About

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG from Scratch

What You'll Learn

Concept Overview

Learning Path

0. How RAG Works

1. Data Loading

2. Text Splitting & Chunking

3. Embedding

4. Vector Store

5. Basic Retrieval

7. Query Preprocessing

8. Hybrid Search

9. Multi-Query Retrieval

10. Query Rewriting

Project Structure

How it works

Current Implementation Status

What's Implemented

What's Coming Next

Requirements

Philosophy

Contribute

See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 1

Languages