Intermediate N8n Tutorial 5 min read

Build a RAG Pipeline in n8n with a Vector Database

#n8n #rag #vector-database #pinecone #embeddings #workflow

What We’re Building

A Retrieval-Augmented Generation (RAG) pipeline in n8n: documents are embedded and stored in a vector database. When a user asks a question, n8n retrieves relevant chunks and feeds them as context to an LLM — producing grounded, accurate answers instead of hallucinations.

This guide covers two workflows:

  1. Ingestion workflow — loads documents into a vector store
  2. Query workflow — retrieves context and answers questions

Architecture Overview

INGESTION:
Read File (PDF/text)

Text Splitter (chunk the document)

Embeddings Model (OpenAI text-embedding-3-small)

Vector Store (Pinecone / Qdrant / in-memory)

QUERY:
Webhook: POST /webhook/rag-query

AI Agent Node
  ├── LLM: OpenAI Chat Model (gpt-4o-mini)
  ├── Memory: Window Buffer Memory
  └── Tool: Vector Store Retriever

Respond to Webhook

Part 1: Document Ingestion Workflow

Step 1: Trigger

For one-off ingestion, use a Manual Trigger node. For scheduled ingestion (e.g., sync a folder daily), use a Schedule Trigger.

Step 2: Read Binary File

Add a Read/Write Files from Disk node (or HTTP Request to fetch a URL):

  • File path: /data/my-document.pdf
  • Output: binary data

For multiple files in a folder, use Local File Trigger with watch mode.

Step 3: Extract Text

Add a Extract from File node:

  • Operation: Extract text from PDF (or Markdown, HTML, etc.)
  • Input: the binary data from the previous step

For plain text files, you can skip this step and pass the text directly.

Step 4: Text Splitter

n8n’s Recursive Character Text Splitter splits documents into chunks:

  1. Add a Code node or use the built-in splitter in the Embeddings workflow:
    // Code node: split text into chunks
    const text = $input.item.json.text;
    const chunkSize = 500;
    const overlap = 50;
    const chunks = [];
    
    for (let i = 0; i < text.length; i += chunkSize - overlap) {
      chunks.push({
        text: text.slice(i, i + chunkSize),
        metadata: {
          source: $input.item.json.source || 'document',
          chunkIndex: chunks.length,
        }
      });
    }
    
    return chunks.map(chunk => ({ json: chunk }));

Step 5: Embeddings + Vector Store Insert

Add a Vector Store node (Pinecone, Qdrant, or In-Memory Vector Store):

  • Mode: Insert Documents
  • Embeddings: Add “Embeddings OpenAI” sub-node
    • Model: text-embedding-3-small
    • Credential: your OpenAI API key
  • Document: Connect to the text chunks from the previous step

For Pinecone:

  • Add Pinecone credential (API key + environment)
  • Index name: my-knowledge-base
  • Namespace: docs (optional)

For Qdrant (self-hosted):

  • URL: http://localhost:6333
  • Collection name: my-knowledge-base

Running the Ingestion

Activate the workflow, click Test Workflow, and watch n8n embed and store each chunk. For 100 pages, expect ~30 seconds.


Part 2: Query Workflow

Step 1: Webhook Trigger

Add a Webhook node:

  • HTTP Method: POST
  • Path: rag-query
  • Response Mode: Using Respond to Webhook Node

Step 2: AI Agent with Vector Store Tool

Add an AI Agent node:

  1. Chat Model: OpenAI Chat Model — gpt-4o-mini

  2. Memory: Window Buffer Memory (optional — for multi-turn)

  3. Tool: Add “Vector Store Retriever” sub-node

    • Connect to the same vector store used for ingestion
    • Top K: 5 (return top 5 most relevant chunks)
    • Tool name: knowledge_base
    • Tool description: Search the knowledge base for relevant information. Use this tool to answer questions about the documents.
  4. System Prompt:

    You are a helpful assistant with access to a knowledge base.
    When answering questions, always use the knowledge_base tool to retrieve relevant information first.
    Base your answers only on the retrieved context — do not make up information.
    If the knowledge base doesn't contain relevant information, say so clearly.
    Today's date: {{ $now.toFormat('yyyy-MM-dd') }}.
  5. User Message: {{ $json.body.question }}

Step 3: Respond to Webhook

{
  "answer": "{{ $('AI Agent').item.json.output }}"
}

Testing the Query Workflow

curl -X POST http://localhost:5678/webhook/rag-query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the main topics covered in this document?"}'

In-Memory Vector Store (No External Service)

For testing without Pinecone or Qdrant, use n8n’s built-in In-Memory Vector Store:

  1. Ingestion: add In-Memory Vector Store node with mode Insert
  2. Query: add In-Memory Vector Store node with mode Retrieve
  3. Both workflows must be in the same n8n instance (data is lost on restart)

The In-Memory store is ideal for:

  • Development and prototyping
  • Short-lived RAG sessions
  • Small document sets (< 1,000 chunks)

Advanced: Source Citation

To include source metadata in responses, update the system prompt and response:

System Prompt Addition:

When answering, always cite the source document and chunk index at the end.
Format: [Source: {source}, Chunk: {chunkIndex}]

Post-processing Code Node:

const output = $('AI Agent').item.json.output;
const sourceNodes = $('AI Agent').item.json.sourceDocuments || [];

return {
  answer: output,
  sources: sourceNodes.map(n => n.metadata?.source).filter(Boolean),
};

Scheduled Re-Ingestion

Keep your vector store up to date by running the ingestion workflow on a schedule:

  1. Replace Manual Trigger with Schedule Trigger
  2. Set interval: daily at 2am
  3. Add a Delete step before insertion (clear stale data):
    • Add Vector Store node with mode Delete
    • Filter: delete by metadata if you track document IDs

Frequently Asked Questions

Which vector store should I choose?

StoreBest For
In-MemoryDev/testing, no infra needed
QdrantSelf-hosted, production, free
PineconeManaged cloud, scalability
Supabase pgvectorAlready using Postgres

Start with In-Memory for testing, then switch to Qdrant or Pinecone for production.

How many chunks should I retrieve (Top K)?

Start with 5. If answers are incomplete, increase to 8–10. If answers contain irrelevant info, decrease to 3. Match your chunk size to your expected answer length.

Can I RAG over a website instead of a PDF?

Yes. Replace Read File with an HTTP Request node to fetch the URL, then use an HTML Extract node to extract the body text. Chain multiple requests with a Loop Over Items node to crawl multiple pages.

My answers are inaccurate — how do I debug?

  1. Add a Code node after the Vector Store Retriever to log what was retrieved
  2. Check if the chunks are too small (increase chunkSize)
  3. Verify the document was embedded correctly by running a test search
  4. Try increasing similarity_top_k to give the LLM more context

Can I connect this to my chatbot workflow?

Yes — replace the standalone RAG workflow with the AI Agent from your chatbot. Add the Vector Store Retriever as a tool alongside the Window Buffer Memory. The chatbot now has both conversational memory and knowledge retrieval.

Next Steps

Related Articles