How to Build GraphRAG Pipelines with Python: Knowledge Graphs for Smarter Retrieval

Build GraphRAG pipelines in Python using Microsoft GraphRAG and Neo4j. Covers knowledge graph construction, entity extraction, community detection, and retrieval strategies that hit 89-91% accuracy on relational queries where traditional RAG scores only 28-34%.

Why GraphRAG Outperforms Traditional RAG on Complex Queries

Traditional RAG works well enough when someone asks a straightforward factual question and the answer lives inside a single text chunk. Embed the query, retrieve the top-k nearest neighbors, feed them to an LLM, and you're done. But the moment a question requires understanding relationships between entities — "Which suppliers overlap across our top three at-risk product lines?" or "How are these regulatory changes connected to our pending contracts?" — flat vector similarity falls apart. The retrieval layer has no concept of structure. It fetches text that looks similar, not text that is logically connected.

GraphRAG solves this by replacing (or augmenting) the vector index with a knowledge graph. Entities become nodes, relationships become edges, and retrieval becomes graph traversal. Instead of matching embedding distances, the system follows explicit connections between concepts — enabling multi-hop reasoning that vector search simply can't perform.

The performance gap isn't subtle, either.

In a Lettria/AWS benchmark, GraphRAG achieved 80% accuracy compared to 50.83% for traditional RAG on relational queries. Microsoft's enterprise evaluation reported an even starker contrast: 86% vs. 32%. On the RobustQA benchmark, knowledge graph RAG scored 86.31% against a traditional RAG range of 32–75%. And for complex relational queries specifically, the gap widens dramatically — traditional RAG scores 28–34% while GraphRAG reaches 89–91%.

Now, there's a catch. For simple, single-fact lookups, both approaches perform roughly equally at 94–95% accuracy. GraphRAG also incurs approximately 2.4x higher latency on average and significantly higher indexing costs. So it's not a universal replacement — it's a targeted upgrade for use cases where relationships actually matter. In this guide, I'll walk you through exactly how to build GraphRAG pipelines with Python using the two dominant frameworks: Microsoft GraphRAG and Neo4j GraphRAG Python.

How the GraphRAG Pipeline Works

Before we start writing code, it's worth understanding the end-to-end GraphRAG pipeline. The foundational architecture was formalized in Microsoft's paper "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (arXiv 2404.16130), and it operates in five stages:

  1. Document Ingestion — Raw documents (PDFs, text files, HTML) are chunked into manageable text segments, similar to traditional RAG.
  2. Entity and Relationship Extraction — An LLM processes each chunk to identify named entities (people, organizations, concepts, locations) and the relationships between them. This is the most LLM-intensive step, and honestly, it's where most of your indexing budget goes.
  3. Graph Construction — Extracted entities and relationships are assembled into a knowledge graph. Duplicate entities get merged, and edge weights are assigned based on co-occurrence frequency and relationship strength.
  4. Community Detection — The Leiden algorithm partitions the graph into hierarchical communities — clusters of densely connected entities. Each community represents a thematic grouping in the data.
  5. Community Summarization — An LLM generates natural-language summaries for each community at multiple hierarchy levels, creating a multi-resolution map of the entire dataset.

At query time, GraphRAG supports three search modes:

  • Local Search — For entity-focused queries. The system identifies relevant entities, traverses their local subgraph neighborhood, and combines graph context with vector similarity to build the answer. Best for questions like "What are the key partnerships of Company X?"
  • Global Search — For holistic, dataset-wide questions. Uses a map-reduce pattern over community summaries at the appropriate hierarchy level. Best for questions like "What are the major themes across all the research papers?"
  • DRIFT Search — A hybrid that combines local and global strategies, dynamically choosing the right approach based on query characteristics. Think of it as the "auto" mode.

Setting Up Microsoft GraphRAG with Python

Microsoft's graphrag package (currently at v3.0.2) provides a CLI-driven pipeline that handles the entire indexing and querying workflow. It's the most mature GraphRAG implementation available, and it works well for batch-processing document collections.

Installation and Project Initialization

# Install the graphrag package
pip install graphrag

# Create a new project directory
mkdir graphrag-project && cd graphrag-project

# Initialize the GraphRAG project structure
graphrag init

# This creates:
# - settings.yaml    (pipeline configuration)
# - .env             (API keys)
# - input/           (place your documents here)

After initialization, drop your source documents in the input/ directory. GraphRAG supports plain text out of the box and can be configured for other formats. Then add your LLM API key to the .env file:

# .env
GRAPHRAG_API_KEY=your-openai-api-key-here

Configuring the Pipeline

The settings.yaml file controls every aspect of the pipeline. Here's a practical configuration that balances cost and quality:

# settings.yaml - Key configuration sections

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat
  model: gpt-4o-mini          # Use gpt-4o for higher quality, gpt-4o-mini for lower cost
  max_tokens: 4000
  temperature: 0.0

embeddings:
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding
    model: text-embedding-3-small

chunks:
  size: 1200
  overlap: 200

entity_extraction:
  max_gleanings: 1             # Number of additional extraction passes per chunk
  prompt: null                 # Use default prompt, or provide a custom one

community_reports:
  max_length: 2000

cluster_graph:
  max_cluster_size: 10

local_search:
  max_tokens: 12000

global_search:
  max_tokens: 12000

Running the Indexing Pipeline

# Run the full indexing pipeline
graphrag index

# This executes the complete pipeline:
# 1. Chunks documents from input/
# 2. Extracts entities and relationships via LLM
# 3. Builds the knowledge graph
# 4. Runs Leiden community detection
# 5. Generates community summaries
# Output is stored in output/ directory

Fair warning — depending on your corpus size, this step can take a while and burn through API credits. For a first test, I'd recommend starting with a small set of 5–10 documents just to get the pipeline running before going all-in on your full dataset.

Querying the Knowledge Graph

# Local search - for entity-specific questions
graphrag query \
    --method local \
    --query "What are the key technical components of the system architecture?"

# Global search - for dataset-wide thematic questions
graphrag query \
    --method global \
    --query "What are the major research themes and how do they interconnect?"

# DRIFT search - hybrid local + global
graphrag query \
    --method drift \
    --query "How does component A relate to the overall system goals?"

Programmatic Usage in Python

For integration into larger applications, you can invoke the GraphRAG pipeline programmatically. This is what you'll likely want in production rather than shelling out to the CLI:

import asyncio
from pathlib import Path
from graphrag.api import build_index, global_search, local_search
from graphrag.config import load_config

async def run_graphrag_pipeline():
    """Run Microsoft GraphRAG indexing and querying programmatically."""

    project_root = Path("./graphrag-project")

    # Load configuration
    config = load_config(project_root)

    # Step 1: Build the index (entity extraction, graph construction, etc.)
    print("Building GraphRAG index...")
    index_result = await build_index(config=config, root=project_root)

    for workflow_result in index_result:
        status = "SUCCESS" if workflow_result.errors is None else "ERROR"
        print(f"  Workflow '{workflow_result.workflow}': {status}")

    # Step 2: Query with local search
    local_result = await local_search(
        config=config,
        root=project_root,
        query="What entities are most central to the dataset?",
    )
    print(f"\nLocal Search Result:\n{local_result.response}")

    # Step 3: Query with global search
    global_result = await global_search(
        config=config,
        root=project_root,
        query="Summarize the main themes across the entire corpus.",
    )
    print(f"\nGlobal Search Result:\n{global_result.response}")

# Execute
asyncio.run(run_graphrag_pipeline())

Building a GraphRAG Pipeline with Neo4j GraphRAG Python

While Microsoft's GraphRAG is great for batch processing, the neo4j-graphrag package (released February 2026) takes a more flexible, production-oriented approach. It stores your knowledge graph in Neo4j, supports multiple LLM providers (OpenAI, Anthropic, Google Gemini, and Ollama for local models), and offers a variety of retrieval strategies you can mix and match.

If you're already running Neo4j in your stack, this is probably the path of least resistance.

Installation and Setup

# Install neo4j-graphrag with OpenAI support
pip install "neo4j-graphrag[openai]"

# For Anthropic Claude support
pip install "neo4j-graphrag[anthropic]"

# For local models via Ollama
pip install "neo4j-graphrag[ollama]"

# Install Neo4j Python driver
pip install neo4j

You'll need a running Neo4j instance. The easiest option for development is Neo4j Desktop or Neo4j AuraDB (there's a free tier available). Make sure the APOC plugin is installed — it's required for some graph operations.

Building the Knowledge Graph with SimpleKGPipeline

The SimpleKGPipeline class handles entity extraction, relationship detection, and graph construction in a single high-level API. It's surprisingly clean for what it does under the hood:

import neo4j
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

# Connect to Neo4j
neo4j_driver = neo4j.GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "your-password"),
)

# Configure the LLM for entity extraction
llm = OpenAILLM(
    model_name="gpt-4o-mini",
    model_params={
        "temperature": 0.0,        # Deterministic extraction
        "max_tokens": 2000,
        "response_format": {"type": "json_object"},
    },
)

# Configure embeddings for vector similarity
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Define the entity types and relationship types for your domain
# This guides the LLM to extract structured, domain-relevant information
entity_types = [
    "Person",
    "Organization",
    "Technology",
    "Product",
    "Concept",
    "Location",
    "Event",
]

relation_types = [
    "WORKS_FOR",
    "FOUNDED",
    "USES",
    "COMPETES_WITH",
    "PARTNERS_WITH",
    "LOCATED_IN",
    "DEVELOPED",
    "RELATED_TO",
]

# Build the knowledge graph pipeline
kg_pipeline = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver,
    embedder=embeddings,
    entities=entity_types,
    relations=relation_types,
    on_error="IGNORE",           # Skip chunks that fail extraction
    from_pdf=False,              # Set True if ingesting PDFs directly
)

# Ingest documents into the knowledge graph
documents = [
    "OpenAI developed GPT-4, a large language model. Sam Altman is the CEO of OpenAI. "
    "OpenAI partners with Microsoft, which invested $13 billion in the company. "
    "Microsoft is headquartered in Redmond, Washington.",

    "Anthropic was founded by Dario Amodei and Daniela Amodei, former OpenAI researchers. "
    "Anthropic developed Claude, a competing language model. Anthropic is based in "
    "San Francisco and has received investment from Google and Amazon.",

    "Meta AI released LLaMA, an open-source large language model. Mark Zuckerberg leads "
    "Meta. Meta competes with both OpenAI and Anthropic in the AI space. "
    "Meta is headquartered in Menlo Park, California.",
]

import asyncio

async def build_knowledge_graph():
    """Ingest documents and build the knowledge graph."""
    for i, doc in enumerate(documents):
        print(f"Processing document {i + 1}/{len(documents)}...")
        await kg_pipeline.run_async(text=doc)
    print("Knowledge graph construction complete.")

asyncio.run(build_knowledge_graph())

Querying with Different Retriever Strategies

This is where Neo4j GraphRAG really shines. It provides four retriever types, each suited to different query patterns. Having options here matters more than you might think — the right retriever can make a huge difference in answer quality:

from neo4j_graphrag.retrievers import (
    VectorRetriever,
    VectorCypherRetriever,
    HybridRetriever,
    Text2CypherRetriever,
)
from neo4j_graphrag.generation import GraphRAG

# --- Retriever 1: VectorRetriever ---
# Pure vector similarity over node embeddings
# Best for: Simple semantic queries
vector_retriever = VectorRetriever(
    driver=neo4j_driver,
    index_name="vector_index",       # Name of the Neo4j vector index
    embedder=embeddings,
    return_properties=["text", "name"],
)

# --- Retriever 2: VectorCypherRetriever ---
# Vector search + graph traversal via Cypher
# Best for: Questions that need both semantic similarity AND graph structure
vector_cypher_retriever = VectorCypherRetriever(
    driver=neo4j_driver,
    index_name="vector_index",
    embedder=embeddings,
    retrieval_query="""
        MATCH (node)-[r]->(neighbor)
        RETURN node.text AS text,
               node.name AS name,
               type(r) AS relationship,
               neighbor.name AS connected_entity,
               score
        ORDER BY score DESC
        LIMIT 10
    """,
)

# --- Retriever 3: HybridRetriever ---
# Combines vector search with fulltext (keyword) search
# Best for: Queries mixing natural language with exact terms
hybrid_retriever = HybridRetriever(
    driver=neo4j_driver,
    vector_index_name="vector_index",
    fulltext_index_name="fulltext_index",
    embedder=embeddings,
    return_properties=["text", "name"],
)

# --- Retriever 4: Text2CypherRetriever ---
# LLM translates natural language to Cypher queries
# Best for: Complex relational queries requiring precise graph traversal
text2cypher_retriever = Text2CypherRetriever(
    driver=neo4j_driver,
    llm=llm,
    neo4j_schema=None,  # Auto-detected from database
)

# --- Build the GraphRAG generation pipeline ---
graphrag_pipeline = GraphRAG(
    retriever=vector_cypher_retriever,  # Choose your retriever
    llm=llm,
)

# Query the knowledge graph
response = graphrag_pipeline.search(
    query_text="What companies compete with OpenAI and who leads them?",
    retriever_config={"top_k": 5},
)

print(f"Answer: {response.answer}")
print(f"\nContext used ({len(response.retriever_result.items)} items):")
for item in response.retriever_result.items:
    print(f"  - {item.content[:120]}...")

Swapping LLM Providers

One nice advantage of Neo4j GraphRAG Python is how easily you can swap between LLM providers without changing your pipeline logic. This is particularly handy if you want to benchmark different models or keep your data fully private with local inference:

# Using Anthropic Claude
from neo4j_graphrag.llm import AnthropicLLM

claude_llm = AnthropicLLM(
    model_name="claude-sonnet-4-20250514",
    model_params={"temperature": 0.0, "max_tokens": 2000},
)

# Using Ollama for fully local, private inference
from neo4j_graphrag.llm import OllamaLLM

local_llm = OllamaLLM(
    model_name="llama3.1:8b",
    model_params={"temperature": 0.0},
)

# Swap into the pipeline -- no other code changes needed
graphrag_local = GraphRAG(
    retriever=vector_cypher_retriever,
    llm=local_llm,  # Private, no data leaves your network
)

GraphRAG vs. Vector RAG: When to Use Which

Choosing between GraphRAG and traditional vector RAG isn't an either/or decision — it really depends on your data, your queries, and your constraints. Here's a practical breakdown:

Dimension Vector RAG GraphRAG Recommendation
Simple factual queries ~94–95% accuracy ~94–95% accuracy Vector RAG (simpler, cheaper)
Multi-hop relational queries 28–34% accuracy 89–91% accuracy GraphRAG (decisive advantage)
Dataset-wide summarization Poor (limited to retrieved chunks) Strong (community summaries) GraphRAG global search
Indexing cost $2–5 per corpus $20–500 per corpus Vector RAG for budget-constrained projects
Query latency ~200ms baseline ~2.4x higher (~480ms) Vector RAG for latency-critical apps
Data with rich relationships Relationships lost in embedding Relationships explicitly modeled GraphRAG (organizational data, research, legal)
Implementation complexity Low (mature tooling) Medium-High (graph DB + LLM extraction) Vector RAG for MVPs, GraphRAG for mature systems
Rapidly changing data Easy incremental updates Re-indexing is expensive Vector RAG, or LazyGraphRAG (0.1% cost)

The practical takeaway: Start with vector RAG. Measure your accuracy on relational and multi-hop queries. If those metrics fall below your threshold, introduce GraphRAG for that specific query class. Many production systems I've seen run both in parallel — vector RAG for simple lookups and GraphRAG for complex relational questions — with a query router deciding which path to take.

Reducing Costs with LazyGraphRAG

Let's be real — the biggest objection to GraphRAG adoption is cost. Full indexing requires LLM calls for every text chunk (entity extraction) and every community (summarization), which can add up to $20–500 depending on corpus size. That's a tough pill to swallow when you're not yet sure GraphRAG will improve your results enough to justify it.

Microsoft addressed this directly with LazyGraphRAG, which reduces indexing costs to 0.1% of the original — making it comparable to standard vector RAG indexing at $2–5.

How does it pull this off? By deferring the expensive LLM-based summarization to query time. During indexing, it uses lightweight NLP (NER models, co-occurrence statistics) to build the graph structure without any LLM calls. At query time, it dynamically generates summaries for just the relevant portion of the graph. You pay LLM costs only for queries actually asked, rather than pre-computing summaries for the entire graph.

For teams evaluating GraphRAG, this is a game changer. You can index your full corpus at near-vector-RAG prices, then pay per-query costs only when relational retrieval is actually needed.

Production Considerations

Getting a GraphRAG pipeline working in a notebook is the easy part. Running one in production requires attention to several things that'll bite you if you don't plan for them.

Entity Resolution and Graph Quality

The quality of your knowledge graph directly determines the quality of your retrieval — garbage in, garbage out. LLM-based entity extraction is imperfect. It will produce duplicate entities ("OpenAI", "Open AI", "openai"), miss implicit relationships, and occasionally hallucinate connections. In production, you'll want an entity resolution layer that merges duplicates, normalizes entity names, and validates extracted relationships against your domain ontology. This isn't glamorous work, but it's often the difference between a demo that impresses and a system that actually works reliably.

Incremental Updates

Real-world document collections change over time. Full re-indexing on every update is prohibitively expensive. Design your pipeline for incremental ingestion: extract entities from new documents only, merge them into the existing graph, and re-run community detection on the affected subgraph. Both Microsoft GraphRAG and Neo4j support this pattern, though it requires careful implementation to avoid graph inconsistencies.

Hybrid Architectures

The strongest production systems I've come across combine vector search and graph search behind a query router. A lightweight classifier (or even a rules-based heuristic) analyzes incoming queries and routes them to the appropriate retrieval path:

from enum import Enum

class QueryType(Enum):
    SIMPLE_FACTUAL = "simple"
    RELATIONAL = "relational"
    GLOBAL_SUMMARY = "global"

def classify_query(query: str) -> QueryType:
    """Route queries to the appropriate retrieval strategy.

    In production, replace this with an LLM-based classifier
    or a fine-tuned small model for higher accuracy.
    """
    relational_signals = [
        "related to", "connected", "relationship",
        "between", "compared to", "influence",
        "partner", "competitor", "linked",
        "how does", "what connects",
    ]
    global_signals = [
        "overall", "summarize", "main themes",
        "across all", "general trends", "big picture",
    ]

    query_lower = query.lower()

    if any(signal in query_lower for signal in global_signals):
        return QueryType.GLOBAL_SUMMARY
    if any(signal in query_lower for signal in relational_signals):
        return QueryType.RELATIONAL
    return QueryType.SIMPLE_FACTUAL


def route_and_retrieve(query: str, vector_rag, graph_rag) -> str:
    """Execute the query through the appropriate pipeline."""
    query_type = classify_query(query)

    if query_type == QueryType.SIMPLE_FACTUAL:
        # Fast, cheap vector retrieval
        return vector_rag.search(query)

    elif query_type == QueryType.RELATIONAL:
        # Graph traversal for relationship-aware answers
        return graph_rag.search(query, method="local")

    elif query_type == QueryType.GLOBAL_SUMMARY:
        # Community-level summarization
        return graph_rag.search(query, method="global")


# Usage
query = "How are OpenAI and Anthropic connected through their founding team?"
result = route_and_retrieve(query, vector_rag=my_vector_rag, graph_rag=my_graph_rag)
print(result)

Monitoring and Evaluation

GraphRAG adds new dimensions to your monitoring stack. Beyond standard RAG metrics (context precision, faithfulness, answer relevancy), you'll want to track graph-specific metrics: entity extraction precision and recall, graph density, community modularity scores, and the ratio of queries routed to graph vs. vector retrieval. The GraphRAG-Bench benchmark, accepted at ICLR 2026, provides standardized evaluation datasets and metrics specifically designed for knowledge graph RAG systems — worth checking out if you're serious about measuring performance.

Frequently Asked Questions

What is GraphRAG and how does it differ from traditional RAG?

GraphRAG is a retrieval-augmented generation approach that uses a knowledge graph instead of (or alongside) a flat vector index. In traditional RAG, documents are split into chunks, embedded as vectors, and retrieved by cosine similarity. GraphRAG adds a layer of structured understanding: it extracts entities and their relationships from documents, builds a graph where entities are nodes and relationships are edges, then uses graph traversal for retrieval. This enables multi-hop reasoning ("A is connected to B, which is connected to C") that vector similarity simply can't perform. Traditional RAG treats each chunk as independent; GraphRAG understands the connections between them.

When should I use GraphRAG instead of vector RAG?

Use GraphRAG when your queries require understanding relationships between entities — organizational hierarchies, supply chain dependencies, research citation networks, legal contract connections, or any domain where "how are X and Y related?" is a common question. Benchmarks show GraphRAG reaches 89–91% accuracy on complex relational queries where traditional RAG scores only 28–34%. However, for simple factual lookups ("What is the return policy?"), both approaches perform equally at ~95% accuracy. The practical strategy is to start with vector RAG, identify query categories where accuracy is insufficient, and selectively introduce GraphRAG for those categories.

How much does GraphRAG cost compared to traditional RAG?

The primary cost difference is in indexing. Traditional vector RAG indexing (chunking + embedding) typically costs $2–5 per corpus. Full GraphRAG indexing (entity extraction + graph construction + community summarization) costs $20–500 depending on corpus size, because every chunk requires LLM calls for entity extraction and every community requires LLM-generated summaries. However, LazyGraphRAG reduces indexing costs to approximately 0.1% of full GraphRAG — making it comparable to vector RAG at $2–5. Query-time costs are moderately higher due to ~2.4x latency, but the per-query cost difference is small compared to the indexing gap.

Can I use GraphRAG with open-source LLMs?

Yes, and this is a big deal for privacy-sensitive industries. The neo4j-graphrag package natively supports Ollama, which lets you run open-source models like Llama 3.1, Mistral, and Qwen entirely locally. Your data never leaves your network — critical for healthcare, finance, and government use cases. The trade-off is that entity extraction quality depends heavily on model capability; smaller open-source models (7–8B parameters) produce less accurate knowledge graphs than GPT-4o or Claude. For production with open-source models, consider using a larger model (70B+) for the one-time indexing step and a smaller model for query-time generation.

How does GraphRAG handle large document collections?

GraphRAG's hierarchical community structure is specifically designed for scale. The Leiden algorithm partitions the knowledge graph into communities at multiple resolution levels, and community summaries provide compressed representations of large entity clusters. Global search operates over these summaries rather than raw documents, making it efficient even on corpora with millions of entities. For very large collections, the primary bottleneck is the initial indexing cost (LLM calls for entity extraction), which scales linearly with document count. LazyGraphRAG helps here by deferring summarization to query time, and incremental indexing lets you process new documents without re-indexing the entire corpus.

About the Author Editorial Team

Our team of expert writers and editors.