Knowledge Graphs & Structured Reasoning

Published:

Vector search finds things that sound similar. It retrieves chunks of text whose embeddings land near the query embedding in high-dimensional space. That works well for questions whose answers live inside one or two paragraphs of a document. But some questions have to be answered using relationships.

"What services depend on the database that went down?" "Which teams own the components affected by this vulnerability?" "Who approved the contract that references clause 7.3?" These questions require traversing connections between entities instead scanning for paragraphs that contain the right keywords. A vector store will struggle here because the answer is not in any single chunk. It is in the structure between chunks.

This is where knowledge graphs enter the picture. A knowledge graph represents information as entities connected by typed relationships — nodes and edges, subjects and predicates — and lets you reason over that structure directly. For agents, it opens a mode of retrieval and reasoning that vector RAG simply cannot provide.

The Knowledge Graph #

A knowledge graph is a collection of triples: subject → predicate → object. Each triple is a fact. The collection of triples forms a directed, labeled graph where entities are nodes and relationships are edges.

  ┌─────────────┐   depends_on     ┌──────────────┐
  │ API Gateway │────────────────▶│ Auth Service │
  └─────────────┘                  └──────────────┘
        │                                │
        │ depends_on                     │ depends_on
        ▼                                ▼
  ┌──────────────┐               ┌───────────────┐
  │ Order Service│               │ User Database │
  └──────────────┘               └───────────────┘
        │
        │ depends_on
        ▼
  ┌───────────────┐
  │ User Database │
  └───────────────┘

In this small graph, a single traversal answers "what breaks if User Database goes down?" — walk backward from User Database through depends_on edges, collect every service that directly or transitively relies on it. No embedding model in the world will give you that answer from flat text alone, because the reasoning requires combining multiple facts that may live in completely separate documents.

# A triple store at its simplest
triples = [
    ("API Gateway", "depends_on", "Auth Service"),
    ("API Gateway", "depends_on", "Order Service"),
    ("Auth Service", "depends_on", "User Database"),
    ("Order Service", "depends_on", "User Database"),
    ("Auth Service", "owned_by", "Platform Team"),
    ("Order Service", "owned_by", "Commerce Team"),
]

def query_impact(target: str) -> list[str]:
    """Find all services affected if target goes down."""
    affected = set()
    frontier = {target}

    while frontier:
        current = frontier.pop()
        # Find everything that depends on current
        dependents = {t[0] for t in triples if t[1] == "depends_on" and t[2] == current}
        new = dependents - affected
        affected.update(new)
        frontier.update(new)

    return sorted(affected)

This is intentionally simple. Real knowledge graphs use proper graph databases (Neo4j, Amazon Neptune, or open-source options like Apache Jena) and query languages like Cypher or SPARQL. But the core idea is the same: facts are structured as triples, and you answer questions by traversing edges.

Vector RAG Weakness #

Vector RAG has a fundamental mismatch with relational questions. Consider a corporate knowledge base with thousands of documents about organizational structure, project ownership, and approval chains. A user asks: "Who has authority to approve purchases over $50K for the infrastructure team?"

The vector store might retrieve chunks that mention "$50K threshold" or "infrastructure team budget" or "approval authority." But the actual answer requires combining:

  1. The infrastructure team's cost center
  2. The approval matrix that maps cost centers to approvers at various thresholds
  3. The current roster of people in those approver roles

These facts might live in three separate documents. Worse, the approval matrix might be a table that got chunked awkwardly — the row for $50K and the column for infrastructure might be in different chunks. The vector store retrieves similar text, not connected facts.

A knowledge graph represents this differently:

  Infrastructure Team ──belongs_to──▶ Cost Center CC-4401
  CC-4401 ──approval_threshold_50k──▶ VP Engineering Role
  VP Engineering Role ──held_by──▶ Jane Smith

One traversal, three hops, definitive answer. No embedding similarity involved.

Building a Knowledge Graph from Unstructured Text #

Most organizations do not have a pre-built knowledge graph sitting around. Their knowledge lives in documents, wikis, Slack messages, and databases. The first challenge is extraction — turning unstructured text into structured triples.

This is where an LLM becomes the extraction engine rather than the answer engine.

def extract_triples(text: str) -> list[dict]:
    prompt = f"""Extract factual relationships from the following text.
Return each relationship as a JSON object with "subject", "predicate", and "object" fields.
Only extract relationships that are explicitly stated or strongly implied.
Use consistent entity names (e.g., always "Auth Service", not "the auth service" and "Auth Service").

Text:
{text}

Return a JSON array of triples:
"""
    response = call_model(prompt, temperature=0.0)
    return parse_json_array(response.text)


def build_graph(documents: list[str]) -> list[dict]:
    all_triples = []

    for doc in documents:
        chunks = split_into_chunks(doc, max_tokens=1000)
        for chunk in chunks:
            triples = extract_triples(chunk)
            all_triples.extend(triples)

    # Deduplicate and resolve entity references
    resolved = resolve_entities(all_triples)
    return resolved

The extraction step has two hard sub-problems:

Entity resolution. The same entity appears under different names — "AWS," "Amazon Web Services," "the cloud provider." You need to canonicalize these to a single node. This is a classic NLP problem that LLMs handle reasonably well with explicit instructions or a second pass.

Predicate normalization. "reports to," "is managed by," "works under" might all mean the same relationship. Without normalization, your graph has many parallel edges that should be one, and traversal queries miss connections. A controlled vocabulary of predicates — defined upfront — helps enormously.

PREDICATE_SCHEMA = {
    "reports_to": "Person → Person (direct manager relationship)",
    "member_of": "Person → Team",
    "owns": "Team → Service",
    "depends_on": "Service → Service",
    "approved_by": "Document → Person",
    "calls": "Service → Service (synchronous dependency)",
}

def extract_triples_with_schema(text: str, schema: dict) -> list[dict]:
    predicate_descriptions = "\n".join(
        f"- {k}: {v}" for k, v in schema.items()
    )
    prompt = f"""Extract relationships from the text below.
Use ONLY the following predicate types:

{predicate_descriptions}

If a relationship does not fit any of these predicates, skip it.

Text:
{text}

Return a JSON array of {{"subject": ..., "predicate": ..., "object": ...}} objects:
"""
    response = call_model(prompt, temperature=0.0)
    return parse_json_array(response.text)

Constraining extraction to a predefined schema dramatically improves graph quality. You get fewer garbage triples, the graph is queryable with predictable traversal patterns, and entity resolution becomes easier because you know the types of entities each predicate connects.

Graph RAG: Combining Graphs and Vector Retrieval #

The most powerful retrieval architecture uses bot vectors and graphs. This is GraphRAG or hybrid retrieval: use vector search to find relevant starting points, then traverse the graph to gather connected context that vector search alone would miss.

  ┌──────────┐     ┌───────────────┐      ┌──────────────┐     ┌──────────┐
  │  Query   │───▶│ Vector search │───▶ │  Graph walk   │───▶│ Generate │
  │          │     │ (find entry   │      │ (expand with │     │  answer  │
  │          │     │  points)      │      │  neighbors)  │     │  (LLM)   │
  └──────────┘     └───────────────┘      └──────────────┘     └──────────┘

The flow:

  1. Embed the query and retrieve the top-k entities or chunks from the vector store
  2. Map those results to nodes in the knowledge graph
  3. Traverse N hops outward from those nodes, collecting related entities and relationships
  4. Assemble the retrieved graph neighborhood into context for the LLM
  5. Generate the answer grounded in both the retrieved text and the graph structure
def graph_rag_query(question: str, top_k: int = 5, hops: int = 2) -> str:
    # Step 1: Vector retrieval to find entry points
    query_vector = embedding_model.encode(question)
    initial_results = vector_store.search(query_vector, limit=top_k)

    # Step 2: Map results to graph entities
    entry_entities = []
    for result in initial_results:
        entities = entity_linker.link(result.text)
        entry_entities.extend(entities)

    # Step 3: Graph traversal — expand neighborhood
    graph_context = set()
    frontier = set(entry_entities)

    for _ in range(hops):
        next_frontier = set()
        for entity in frontier:
            neighbors = graph_db.get_neighbors(entity)
            for neighbor, predicate in neighbors:
                triple_str = f"{entity} --{predicate}--> {neighbor}"
                graph_context.add(triple_str)
                next_frontier.add(neighbor)
        frontier = next_frontier

    # Step 4: Assemble context
    text_context = "\n\n".join(r.text for r in initial_results)
    graph_context_str = "\n".join(sorted(graph_context))

    context = f"""Relevant text passages:
{text_context}

Related facts from knowledge graph:
{graph_context_str}
"""

    # Step 5: Generate
    prompt = f"""Answer the question using the provided context.
If the context does not contain enough information, say so.

Context:
{context}

Question: {question}
"""
    return call_model(prompt, temperature=0.0).text

This hybrid approach handles the best of both worlds. Vector search catches the textual matches — paragraphs that describe things in natural language. Graph traversal catches the structural connections — facts that link entities across documents. Together, the LLM gets a richer, more complete picture than either retrieval method provides alone.

Community Detection and Hierarchical Summarization #

One of the most powerful techniques in GraphRAG is community detection — using graph algorithms to identify clusters of densely connected entities, then generating summaries for each community at multiple levels of granularity.

  Full Graph
  ┌─────────────────────────────────────────────────────────┐
  │  ┌────────────────┐                ┌──────────────┐     │
  │  │   Community    │                │  Community   │     │
  │  │      A         │                │      B       │     │
  │  │ (Auth Service) │────────────────│  (Payments)  │     │
  │  │                │                │              |     |
  │  └────────────────┘                └──────────────┘     │
  │          │                                 │            │
  │          │                                 │            |
  │   ┌───────────────────┐      ┌────────────────────────┐ │
  │   │    Community      │      │       Community        │ │
  │   │        C          │      │            D           │ │
  │   │ (User Management) │──────│ (Notification Service) │ │
  │   │                   │      │                        │ │
  │   └───────────────────┘      └────────────────────────┘ │
  └─────────────────────────────────────────────────────────┘

The idea: run a community detection algorithm (Leiden, Louvain, or label propagation) on the knowledge graph. Each community is a cluster of entities that are more connected to each other than to the rest of the graph. Then generate a natural-language summary of each community — what entities it contains, what their relationships are, and what themes emerge.

import networkx as nx
from networkx.algorithms.community import louvain_communities

def build_community_summaries(graph: nx.Graph) -> dict[int, str]:
    # Detect communities
    communities = louvain_communities(graph, resolution=1.0)

    summaries = {}
    for i, community in enumerate(communities):
        # Get all edges within this community
        subgraph = graph.subgraph(community)
        edges = list(subgraph.edges(data=True))

        # Format triples for summarization
        triples_text = "\n".join(
            f"{u} --{d.get('predicate', 'related_to')}--> {v}"
            for u, v, d in edges
        )

        prompt = f"""Summarize the following group of related entities and relationships.
Describe what this cluster represents, the key entities, and their roles.

Entities: {', '.join(sorted(community))}

Relationships:
{triples_text}

Summary:
"""
        summary = call_model(prompt, temperature=0.0).text
        summaries[i] = summary

    return summaries

At query time, you can match the question against community summaries first — a coarse-grained retrieval that identifies which part of the graph is relevant — then drill into the specific entities and relationships within that community. This works well for broad questions ("What does the payments infrastructure look like?") that do not map to any single entity.

Ontological Reasoning - Types, Hierarchies, and Inference #

A knowledge graph becomes significantly more powerful when you add an ontology — a schema that defines entity types, relationship types, and inheritance hierarchies. With an ontology, the graph can infer facts that are not explicitly stored.

  Ontology (schema):
    Component ──subclass_of──▶ Deployable
    Microservice ──subclass_of──▶ Component
    depends_on: Component × Component → Dependency
    owned_by: Component × Team → Ownership

  Instance data:
    Auth Service ──instance_of──▶ Microservice
    Auth Service ──depends_on──▶ User Database

  Inferred:
    Auth Service ──instance_of──▶ Component (via Microservice → Component)
    Auth Service ──instance_of──▶ Deployable (via Component → Deployable)

This means a query for "all components that depend on User Database" will match Auth Service even though it is only directly typed as a Microservice — the inference engine walks the subclass hierarchy.

For agents, ontological reasoning unlocks a specific capability: schema-guided question decomposition. When an agent receives a complex question, it can consult the ontology to understand what types of entities and relationships exist, then decompose the question into graph traversals that follow valid paths.

def ontology_guided_query(question: str, ontology: dict) -> str:
    # Show the agent what entity types and relationships exist
    schema_description = format_ontology(ontology)

    # Ask the model to decompose the question into graph operations
    prompt = f"""Given this knowledge graph schema:

{schema_description}

Decompose the following question into a sequence of graph traversal steps.
Each step should specify: starting entity type, relationship to traverse, target entity type.

Question: {question}

Traversal plan:
"""
    plan = call_model(prompt, temperature=0.0).text

    # Execute the plan against the graph
    results = execute_traversal_plan(plan)

    # Synthesize the answer
    return synthesize_answer(question, results)

This is a form of structured reasoning — the agent reasons about the shape of the knowledge, not just its content. It knows that a "person → works_at → company → located_in → city" path is valid, and can plan a traversal before executing it. Without the ontology, the agent would have to guess what relationships exist, or retrieve and parse them from context.

Graph Construction as an Agent Tool #

In a production system, graph construction is a full time job. Knowledge changes. New documents arrive. Entities get updated. The knowledge graph needs to stay current — which means graph maintenance becomes a tool the agent can invoke.

graph_tools = [
    {
        "name": "add_entity",
        "description": "Add a new entity node to the knowledge graph",
        "parameters": {
            "entity_id": "Canonical name of the entity",
            "entity_type": "Type from the ontology (Person, Service, etc.)",
            "properties": "Key-value pairs of entity attributes",
        },
    },
    {
        "name": "add_relationship",
        "description": "Add a typed relationship between two entities",
        "parameters": {
            "subject": "Source entity ID",
            "predicate": "Relationship type from the schema",
            "object": "Target entity ID",
        },
    },
    {
        "name": "query_graph",
        "description": "Execute a graph query to find entities or paths",
        "parameters": {
            "cypher_query": "A Cypher query to execute against the graph",
        },
    },
    {
        "name": "get_entity_neighborhood",
        "description": "Get all entities and relationships within N hops of an entity",
        "parameters": {
            "entity_id": "The starting entity",
            "max_hops": "How many relationship hops to traverse (1-3)",
        },
    },
]

An agent equipped with these tools can do something a static RAG pipeline cannot: it can build knowledge as it goes. During a research task, the agent reads documents, extracts entities and relationships, adds them to the graph, and later queries that graph to answer questions that span multiple sources. The graph becomes working memory that accumulates structured understanding over time.

def research_with_graph_building(question: str, sources: list[str]) -> str:
    # Phase 1: Research and extract
    for source in sources:
        content = fetch_document(source)
        triples = extract_triples_with_schema(content, PREDICATE_SCHEMA)

        for triple in triples:
            agent.call_tool("add_entity", {
                "entity_id": triple["subject"],
                "entity_type": infer_type(triple["subject"]),
                "properties": {},
            })
            agent.call_tool("add_entity", {
                "entity_id": triple["object"],
                "entity_type": infer_type(triple["object"]),
                "properties": {},
            })
            agent.call_tool("add_relationship", triple)

    # Phase 2: Query the accumulated graph
    result = agent.call_tool("query_graph", {
        "cypher_query": build_query_from_question(question),
    })

    # Phase 3: Synthesize answer from graph results
    return synthesize_answer(question, result)

Trade-offs #

Knowledge graphs are not a universal replacement for vector RAG. They solve different problems and have different costs. Here is the decision matrix:

Use vector RAG when:

  • Questions are about content — "what does the documentation say about X?"
  • Answers live within one or two paragraphs of source text
  • The knowledge base is large and flat (thousands of documents without rich inter-document relationships)
  • You need fast, low-maintenance retrieval with minimal upfront work

Use a knowledge graph when:

  • Questions are about relationships — "what connects A to B?"
  • Answers require combining facts from multiple sources
  • The domain has clear entity types and relationship types (org charts, supply chains, service architectures, infrastructure dependencies)
  • You need multi-hop reasoning that follows explicit paths
  • Auditability matters — you want to explain why an answer is correct by showing the path

Use hybrid (GraphRAG) when:

  • Questions mix content and relationship needs
  • You want vector search for discovery and graph traversal for completeness
  • The knowledge base is semi-structured — some documents, some databases, some APIs
  • You are building a system that needs to improve its understanding over time

The cost structure is also different:

Dimension Vector RAG Knowledge Graph Hybrid
Setup cost Low (embed and store) High (extract, resolve, validate) Highest
Query latency Fast (one vector search) Moderate (graph traversal) Moderate-high
Maintenance Re-embed on change Update triples on change Both
Accuracy on relational queries Poor Excellent Excellent
Accuracy on content queries Good Poor (no full text) Good
Hallucination risk Moderate Low (structured facts) Low

LightRAG and Practical Implementations #

The LightRAG approach offers a pragmatic middle ground. Instead of building a full enterprise knowledge graph upfront, it extracts a lightweight graph on the fly from documents, combining graph structure with vector retrieval in a single pipeline.

The key insight: you do not need a perfectly curated ontology to benefit from graph structure. Even a noisy, automatically-extracted graph — with some wrong triples, some duplicate entities — provides signal that pure vector search misses. The LLM is tolerant of noise in context. A few wrong triples mixed with mostly correct ones still produce better answers than missing the relational information entirely.

def light_rag_index(documents: list[str]) -> tuple:
    """Index documents with both vector and graph representations."""
    vector_index = VectorStore()
    graph = GraphStore()

    for doc in documents:
        # Standard vector indexing
        chunks = split_into_chunks(doc)
        for chunk in chunks:
            embedding = embedding_model.encode(chunk)
            vector_index.add(chunk, embedding)

        # Lightweight graph extraction (whole document for context)
        triples = extract_triples(doc)
        for triple in triples:
            graph.add_triple(triple)

    return vector_index, graph


def light_rag_query(question: str, vector_index, graph, mode: str = "hybrid") -> str:
    if mode == "local":
        # Graph-only: entity lookup + neighborhood
        entities = extract_entities_from_question(question)
        context = graph.get_neighborhoods(entities, hops=2)

    elif mode == "global":
        # Community summaries for broad questions
        context = graph.get_relevant_communities(question)

    else:  # hybrid
        # Vector retrieval + graph expansion
        vector_results = vector_index.search(question, top_k=5)
        entities = extract_entities_from_results(vector_results)
        graph_context = graph.get_neighborhoods(entities, hops=1)
        context = combine(vector_results, graph_context)

    return generate_answer(question, context)

The three retrieval modes serve different question types. Local mode starts from specific entities and explores their neighborhoods — good for "tell me about X and its relationships." Global mode uses pre-computed community summaries to answer broad thematic questions. Hybrid mode combines both for questions that have both specific and relational aspects.

Graph-Powered Agent Reasoning #

Beyond retrieval, knowledge graphs enable a form of structured reasoning that complements the unstructured reasoning LLMs do natively. The agent can use the graph to:

Validate claims. After generating a response, check whether the stated relationships actually exist in the graph. If the agent claims "Service A depends on Service B," verify by traversing the graph.

Discover paths. When asked "how are X and Y related?", find all paths between the two entities in the graph and present them. This is deterministic — no hallucination possible for the structural part.

Detect contradictions. If a new document states something that contradicts existing graph triples, flag the conflict for human review rather than silently overwriting.

Plan multi-step actions. For an agent managing infrastructure, the dependency graph tells it exactly what downstream services will be affected by a change — no guessing required.

def validate_claim(claim: str, graph) -> dict:
    """Check a natural-language claim against the knowledge graph."""
    # Extract the relationship being claimed
    extracted = extract_triples(claim)

    results = []
    for triple in extracted:
        # Check if this relationship exists in the graph
        exists = graph.has_triple(
            triple["subject"], triple["predicate"], triple["object"]
        )
        # Check for contradicting relationships
        contradictions = graph.find_contradictions(triple)

        results.append({
            "claim": f"{triple['subject']}{triple['predicate']}{triple['object']}",
            "verified": exists,
            "contradictions": contradictions,
        })

    return results

This is the deeper value of knowledge graphs for agents: they provide a ground truth structure that the agent can reason against. The LLM handles natural language understanding, query decomposition, and synthesis. The graph handles factual verification, path finding, and relationship traversal. Each does what it is best at.

Conclusion #

Knowledge graphs give agents a structured backbone for reasoning about relationships, hierarchies, and multi-hop connections — things that flat vector search handles poorly or not at all.

Key takeaways:

  • Vector RAG retrieves by similarity; knowledge graphs retrieve by structure. They solve fundamentally different retrieval problems and are most powerful when combined.
  • Building a knowledge graph from unstructured text requires entity extraction, entity resolution, and predicate normalization — all tasks that LLMs handle well when guided by a predefined schema.
  • GraphRAG combines vector search (for finding entry points) with graph traversal (for expanding context along relationships), giving the LLM both textual and structural evidence.
  • Community detection over the graph produces hierarchical summaries that handle broad thematic questions no single entity can answer.
  • Ontological reasoning — type hierarchies and inference rules — lets the graph answer questions about entities it has not explicitly indexed, by walking inheritance chains.
  • For production agents, graph construction and querying should be exposed as tools, allowing the agent to build structured knowledge incrementally as it processes new information.
  • The decision between vector RAG, graph retrieval, and hybrid depends on whether your questions are about content, relationships, or both — and how much upfront investment you can afford in graph construction and maintenance.