Build a RAG Pipeline in n8n with a Vector Database

Q: Which vector store should I choose?

| Store | Best For | |-------|---------| | In-Memory | Dev/testing, no infra needed | | Qdrant | Self-hosted, production, free | | Pinecone | Managed cloud, scalability | | Supabase pgvector | Already using Postgres | Start with In-Memory for testing, then switch to Qdrant or Pinecone for production.

Q: How many chunks should I retrieve (Top K)?

Start with 5. If answers are incomplete, increase to 8–10. If answers contain irrelevant info, decrease to 3. Match your chunk size to your expected answer length.

Q: Can I RAG over a website instead of a PDF?

Yes. Replace Read File with an HTTP Request node to fetch the URL, then use an HTML Extract node to extract the body text. Chain multiple requests with a Loop Over Items node to crawl multiple pages.

Q: My answers are inaccurate — how do I debug?

1. Add a Code node after the Vector Store Retriever to log what was retrieved 2. Check if the chunks are too small (increase chunkSize) 3. Verify the document was embedded correctly by running a test search 4. Try increasing similaritytopk to give the LLM more context

Q: Can I connect this to my chatbot workflow?

Yes — replace the standalone RAG workflow with the AI Agent from your chatbot. Add the Vector Store Retriever as a tool alongside the Window Buffer Memory. The chatbot now has both conversational memory and knowledge retrieval.

What We’re Building

A Retrieval-Augmented Generation (RAG) pipeline in n8n: documents are embedded and stored in a vector database. When a user asks a question, n8n retrieves relevant chunks and feeds them as context to an LLM — producing grounded, accurate answers instead of hallucinations.

This guide covers two workflows:

Ingestion workflow — loads documents into a vector store
Query workflow — retrieves context and answers questions

Architecture Overview

INGESTION:
Read File (PDF/text)
    ↓
Text Splitter (chunk the document)
    ↓
Embeddings Model (OpenAI text-embedding-3-small)
    ↓
Vector Store (Pinecone / Qdrant / in-memory)

QUERY:
Webhook: POST /webhook/rag-query
    ↓
AI Agent Node
  ├── LLM: OpenAI Chat Model (gpt-4o-mini)
  ├── Memory: Window Buffer Memory
  └── Tool: Vector Store Retriever
    ↓
Respond to Webhook

Part 1: Document Ingestion Workflow

Step 1: Trigger

For one-off ingestion, use a Manual Trigger node. For scheduled ingestion (e.g., sync a folder daily), use a Schedule Trigger.

Step 2: Read Binary File

Add a Read/Write Files from Disk node (or HTTP Request to fetch a URL):

File path: /data/my-document.pdf
Output: binary data

For multiple files in a folder, use Local File Trigger with watch mode.

Step 3: Extract Text

Add a Extract from File node:

Operation: Extract text from PDF (or Markdown, HTML, etc.)
Input: the binary data from the previous step

For plain text files, you can skip this step and pass the text directly.

Step 4: Text Splitter

n8n’s Recursive Character Text Splitter splits documents into chunks:

Add a Code node or use the built-in splitter in the Embeddings workflow:

// Code node: split text into chunks
const text = $input.item.json.text;
const chunkSize = 500;
const overlap = 50;
const chunks = [];

for (let i = 0; i < text.length; i += chunkSize - overlap) {
  chunks.push({
    text: text.slice(i, i + chunkSize),
    metadata: {
      source: $input.item.json.source || 'document',
      chunkIndex: chunks.length,
    }
  });
}

return chunks.map(chunk => ({ json: chunk }));

Step 5: Embeddings + Vector Store Insert

Add a Vector Store node (Pinecone, Qdrant, or In-Memory Vector Store):

Mode: Insert Documents
Embeddings: Add “Embeddings OpenAI” sub-node
- Model: text-embedding-3-small
- Credential: your OpenAI API key
Document: Connect to the text chunks from the previous step

For Pinecone:

Add Pinecone credential (API key + environment)
Index name: my-knowledge-base
Namespace: docs (optional)

For Qdrant (self-hosted):

URL: http://localhost:6333
Collection name: my-knowledge-base

Running the Ingestion

Activate the workflow, click Test Workflow, and watch n8n embed and store each chunk. For 100 pages, expect ~30 seconds.

Part 2: Query Workflow

Step 1: Webhook Trigger

Add a Webhook node:

HTTP Method: POST
Path: rag-query
Response Mode: Using Respond to Webhook Node

Step 2: AI Agent with Vector Store Tool

Add an AI Agent node:

Chat Model: OpenAI Chat Model — gpt-4o-mini
Memory: Window Buffer Memory (optional — for multi-turn)
Tool: Add “Vector Store Retriever” sub-node
- Connect to the same vector store used for ingestion
- Top K: 5 (return top 5 most relevant chunks)
- Tool name: knowledge_base
- Tool description: Search the knowledge base for relevant information. Use this tool to answer questions about the documents.

System Prompt:

You are a helpful assistant with access to a knowledge base.
When answering questions, always use the knowledge_base tool to retrieve relevant information first.
Base your answers only on the retrieved context — do not make up information.
If the knowledge base doesn't contain relevant information, say so clearly.
Today's date: {{ $now.toFormat('yyyy-MM-dd') }}.

User Message: {{ $json.body.question }}

Step 3: Respond to Webhook

{
  "answer": "{{ $('AI Agent').item.json.output }}"
}

Testing the Query Workflow

curl -X POST http://localhost:5678/webhook/rag-query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the main topics covered in this document?"}'

In-Memory Vector Store (No External Service)

For testing without Pinecone or Qdrant, use n8n’s built-in In-Memory Vector Store:

Ingestion: add In-Memory Vector Store node with mode Insert
Query: add In-Memory Vector Store node with mode Retrieve
Both workflows must be in the same n8n instance (data is lost on restart)

The In-Memory store is ideal for:

Development and prototyping
Short-lived RAG sessions
Small document sets (< 1,000 chunks)

Advanced: Source Citation

To include source metadata in responses, update the system prompt and response:

System Prompt Addition:

When answering, always cite the source document and chunk index at the end.
Format: [Source: {source}, Chunk: {chunkIndex}]

Post-processing Code Node:

const output = $('AI Agent').item.json.output;
const sourceNodes = $('AI Agent').item.json.sourceDocuments || [];

return {
  answer: output,
  sources: sourceNodes.map(n => n.metadata?.source).filter(Boolean),
};

Scheduled Re-Ingestion

Keep your vector store up to date by running the ingestion workflow on a schedule:

Replace Manual Trigger with Schedule Trigger
Set interval: daily at 2am
Add a Delete step before insertion (clear stale data):
- Add Vector Store node with mode Delete
- Filter: delete by metadata if you track document IDs

Frequently Asked Questions

Which vector store should I choose?

Store	Best For
In-Memory	Dev/testing, no infra needed
Qdrant	Self-hosted, production, free
Pinecone	Managed cloud, scalability
Supabase pgvector	Already using Postgres

Start with In-Memory for testing, then switch to Qdrant or Pinecone for production.

How many chunks should I retrieve (Top K)?