Design
    • System Architecture
    • Graph Index Creation Process
    • Chat History Message Data Flow
    • Document Upload Design
Integration
    • Integrating ApeRAG with Dify
    • MCP API
Deployment
    • Build Docker Image
Development
    • Development Guide
  1. Documents
GithubDocuments

ApeRAG System Architecture

1. What is ApeRAG

ApeRAG is an open, Agentic Graph RAG platform. It's not just a simple vector retrieval system, but a production-ready solution that deeply integrates knowledge graphs, multimodal retrieval, and intelligent agents.
Traditional RAG systems primarily rely on vector similarity search. While they can find semantically related content, they often lack understanding of relationships between knowledge points. ApeRAG's core innovations are:
  • Graph RAG: Automatically extracts entities (people, places, concepts) and relationships from documents to build knowledge graphs, understanding connections between knowledge points
  • Agentic: Built-in intelligent agents that can autonomously plan, invoke tools, and conduct multi-turn conversations for smarter Q&A experiences
  • Open Integration: Exposes capabilities through RESTful API and MCP Protocol, easily integrating with external systems like Dify, Claude, and Cursor

Core Advantages

Compared to traditional RAG solutions, ApeRAG provides:
  • Powerful Document Processing: Supports PDF, Word, Excel and more, handling complex tables, formulas, and images
  • Multiple Retrieval Methods: Vector, full-text, and graph retrieval complement each other
  • Knowledge Relationship Understanding: Understands concept relationships through knowledge graphs, not just text similarity
  • Open Integration Capabilities: RESTful API + MCP protocol, can serve as knowledge backend for Dify, Claude Desktop, Cursor
  • Production-Grade Architecture: Async processing, multi-storage, high concurrency, ready for production

Architecture Overview

2. Layered Architecture

ApeRAG adopts a clear layered design, with each layer serving its specific purpose:
Layer Responsibilities:
  • Client Layer: Multiple access methods - Web UI for management, MCP clients (Dify, Cursor, Claude, etc.) for integration
  • Interface Layer: RESTful API (traditional HTTP interface) and MCP Server (AI tool protocol) provide services in parallel
  • Service Layer: Core business logic, coordinating resources to complete specific functions
  • Task Layer: Handles time-consuming operations (document parsing, index building) to ensure fast API responses
  • Storage Layer: Multiple storage systems, selecting optimal solutions for different data types

3. Document Processing Flow

This is one of ApeRAG's core capabilities. From uploading a PDF file to making it searchable involves a series of carefully designed processing steps.

3.1 Document Upload and Parsing

When you upload a document, ApeRAG automatically identifies the format and selects the appropriate parser:
DocRay/MinerU's Power:
  • Accurately recognizes complex PDF table structures, preserving table content integrity
  • Extracts LaTeX mathematical formulas, maintaining formula readability
  • Performs OCR on scanned PDFs, supporting mixed Chinese-English text
  • Identifies image regions in documents, supporting image content understanding

3.2 Intelligent Chunking Strategy

After document parsing, content needs to be split into appropriately sized chunks. This step is critical - chunks too large affect retrieval precision, too small lose context.
Chunking Strategy Features:
  • Maintain Semantic Integrity: Avoid breaking sentences in the middle
  • Preserve Title Context: Each chunk knows which section it belongs to
  • Hierarchical Splitting: Split by paragraphs first, then sentences, finally characters
  • Smart Merging: Adjacent small title chunks are merged to avoid information fragmentation
Chunking Parameters:
  • Default chunk size: 1200 tokens (approximately 800-1000 Chinese characters)
  • Overlap size: 100 tokens (ensures context continuity)

3.3 Parallel Multi-Index Building

After chunking, multiple indexes are created simultaneously. Each index serves different purposes and complements the others:
Index TypeUse CaseStorageRetrieval Method
Vector IndexSemantic similarity questions, e.g., "how to optimize performance"QdrantCosine Similarity
Full-text IndexExact keyword search, e.g., "PostgreSQL configuration"ElasticsearchBM25 Algorithm
Graph IndexRelationship questions, e.g., "what's the connection between A and B"PostgreSQL/Neo4jGraph Traversal
Summary IndexQuick document overviewPostgreSQLVector Matching
Vision IndexImage content searchQdrantMultimodal Vector
Advantages of Parallel Building:
  • Different indexes can be built simultaneously, improving speed
  • Failure of one index doesn't affect others
  • Can enable specific index types on demand

3.4 Knowledge Graph Construction

Graph indexing is ApeRAG's signature feature, extracting structured knowledge from documents.
Key Steps in Graph Construction:
  • Entity Extraction: LLM identifies meaningful entities from document chunks
    • Example: From "Zhang San studies AI at Tsinghua University in Beijing"
    • Entities: Zhang San (person), Beijing (location), Tsinghua University (organization), AI (concept)
  • Relation Extraction: Identifies relationships between entities
    • Example: Zhang San --studies--> AI, Zhang San --attends--> Tsinghua University
  • Entity Merging: Same entity may have different expressions, needs normalization
    • Example: "LightRAG", "light rag", "Light-RAG" → merged into unified entity
  • Connected Components Optimization: Divides graph into independent subgraphs for parallel processing
    • Performance improvement: 2-3x throughput
Why Connected Components Optimization?
Suppose you have 100 documents discussing different topics. Entities about "databases" and entities about "machine learning" have no connections and can be processed independently. The connected components algorithm identifies these independent "knowledge islands" and processes them in parallel, greatly improving speed.

3.5 Async Task System

Document processing is time-consuming, so ApeRAG uses a "dual-chain architecture" to ensure good user experience:
Benefits of Dual-Chain:
  • Fast Frontend Response: API returns within 100ms after user uploads, no need to wait for processing
  • Async Backend Processing: Real processing work happens in background without blocking user operations
  • Auto Retry: System automatically retries if processing fails, ensuring eventual success
  • Status Tracking: Users can check document processing progress anytime
Index State Machine:

4. Retrieval and Q&A Flow

Once indexed, users can ask questions. ApeRAG's retrieval system intelligently selects appropriate retrieval strategies.

4.1 Hybrid Retrieval System

Different types of questions suit different retrieval methods. ApeRAG uses multiple retrieval strategies simultaneously and fuses results:
Retrieval Strategy Explanation:
  • Vector Retrieval: For semantically similar questions
    • Q: "How to improve system performance?"
    • Finds: "Optimize database queries", "Use caching", etc.
  • Full-text Retrieval: For exact keyword matching
    • Q: "Where is PostgreSQL configuration file?"
    • Finds paragraphs containing exactly "PostgreSQL" and "configuration file"
  • Graph Retrieval: For relationship questions
    • Q: "What's the relationship between LightRAG and Neo4j?"
    • Queries connection paths between these two entities in the graph
Result Fusion Strategy:
Results from different retrieval methods need merging. ApeRAG uses a Rerank model to re-score all candidate results:
  • Collect all retrieval results (may have duplicates)
  • Deduplicate, keep most relevant segments
  • Use Rerank model to evaluate relevance of each segment to the question
  • Re-sort by new scores
  • Return Top-K results

4.2 Knowledge Graph Query

Graph retrieval has three modes for different types of questions:
ModeUse CaseQuery MethodExample Question
localQuery local info about an entityVector match similar entities → Get neighbor nodes"Zhang San's personal info"
globalQuery overall relationships and patternsVector match similar relations → Get connection paths"What's the company's organizational structure"
hybridComprehensive questionslocal + global combined"Zhang San's role and responsibilities in the company"
Real Example:
Suppose the knowledge graph contains:
  • Entities: Zhang San (person), Database Team (organization), PostgreSQL (technology)
  • Relations: Zhang San --belongs to--> Database Team, Zhang San --excels at--> PostgreSQL
Question: "What is Zhang San responsible for?"
  • Local Mode:
    • Finds "Zhang San" entity
    • Gets all directly connected nodes
    • Returns: "Zhang San belongs to Database Team, excels at PostgreSQL"
  • Global Mode:
    • Finds related relation patterns: "responsible for", "belongs to"
    • Returns entire team structure and responsibility division
  • Hybrid Mode:
    • Uses both methods above
    • Provides more comprehensive answer

4.3 Agent Dialogue System

Agent is ApeRAG's intelligent assistant that can invoke various tools to answer questions.
Agent Workflow:
  • Receive Question: User sends a question
  • Tool Decision: LLM analyzes question and decides which tools to call
    • Possible tools: search_collection (search knowledge base), web_search (search internet), web_read (read web page), etc.
  • Execute Tools: Agent calls corresponding tools
    • Example: search_collection triggers hybrid retrieval, returns relevant documents
  • Generate Answer: LLM generates answer based on retrieved context
  • Stream Response: Answer pushed to user in real-time via SSE (Server-Sent Events), no need to wait for complete generation
Role of MCP Protocol:
MCP (Model Context Protocol) is a standardized tool protocol that allows AI assistants (like Claude Desktop, Cursor) to easily invoke ApeRAG's capabilities. Through MCP, external AI tools can:
  • List your knowledge bases
  • Search knowledge base content
  • Read web page content
  • Search the internet
Dialogue Example:
User: How does ApeRAG's graph indexing work?

Agent thinks: Need to search knowledge base
↓
Call tool: search_collection(query="graph indexing principles", collection_id="aperag-docs")
↓
Retrieval results: Returns document segments about graph construction, entity extraction, relation extraction
↓
Agent answers: ApeRAG's graph indexing works through the following steps... (generated based on retrieved content)

5. Storage Architecture

ApeRAG adopts a multi-storage architecture, selecting the most appropriate storage solution for different data types.

5.1 Storage Selection Decision

5.2 Data Flow

Different data flows to different storage systems:

5.3 Core Storage Systems

PostgreSQL (Primary Database)
Storage Content:
  • User info, permissions, configurations
  • Collection (knowledge base) metadata
  • Document metadata and index status
  • Conversation history
  • Small-scale knowledge graphs (< 100K entities)
Why Choose:
  • Strong transaction support, ensures data consistency
  • Mature and stable, low operational cost
  • pgvector extension, supports vector storage
  • Can handle small-scale graph data without extra graph database
Qdrant (Vector Database)
Storage Content:
  • Document chunk embedding vectors
  • Entity and relation vector representations
  • Multimodal vectors for images
Why Choose:
  • Optimized specifically for vector retrieval, fast
  • Supports filter conditions, can combine with metadata filtering
  • Supports cluster deployment, horizontally scalable
Elasticsearch (Full-text Search)
Storage Content:
  • Document chunk text content
  • Supports Chinese tokenization (IK Analyzer)
Why Choose:
  • BM25 algorithm works well for keyword search
  • Supports complex queries and aggregations
  • Built-in highlighting
MinIO (Object Storage)
Storage Content:
  • Raw document files (PDF, Word, etc.)
  • Intermediate results after parsing
  • Temporary uploaded files
Why Choose:
  • S3 protocol compatible, can replace with cloud storage
  • Low storage cost
  • Supports large files
Graph Database Choice: PostgreSQL vs Neo4j
ApeRAG supports two graph database solutions:
PostgreSQL (Default, Recommended for Small Scale)
Storage Content:
  • Knowledge graphs (< 100K entities)
  • Graph node and edge relationship data
Recommendation Reasons:
  • No extra deployment, lower operational cost
  • Performance sufficient for most scenarios
  • Complete transaction support, data consistency guaranteed
  • Can share database with other business data
Neo4j (Optional, for Large Scale)
Storage Content:
  • Large-scale knowledge graphs (> 1M entities)
When Needed:
  • Entity count exceeds 100K, PostgreSQL query performance degrades
  • Need complex graph traversal queries (multi-hop relations)
  • Need to use graph algorithms (PageRank, community detection, etc.)
Summary: For most enterprise applications, PostgreSQL is completely sufficient. Only consider Neo4j when knowledge graph scale is very large.
Redis (Cache and Queue)
Storage Content:
  • Celery task queue
  • LLM call cache
  • User session cache
Why Choose:
  • Extremely fast, suitable for high-frequency access
  • Supports multiple data structures
  • Can serve as task queue Broker

6. Technical Highlights

6.1 Stateless LightRAG Refactoring

Background Problem:
Original LightRAG uses global state, all tasks share one instance. This causes data confusion and concurrency conflicts in multi-user, multi-Collection scenarios.
ApeRAG's Solution:
  • Each task creates independent LightRAG instance
  • Isolates different Collection data through workspace parameter
  • Entity naming convention: entity:{name}:{workspace}
  • Relation naming convention: relationship:{src}:{tgt}:{workspace}
This way, different users' graph data won't interfere with each other, truly achieving multi-tenant isolation.

6.2 Dual-Chain Async Architecture

Traditional Approach Problem:
After user uploads document, API needs to wait for parsing and index building to complete before returning, possibly taking several minutes or longer.
Dual-Chain Architecture Advantages:
  • Frontend Chain: API only writes state to database, returns within 100ms
  • Backend Chain: Reconciler periodically detects state changes, schedules async tasks
  • Version Control: Implements idempotency through version and observed_version
  • Auto Retry: Automatically retries failed tasks, ensures eventual consistency
This design is inspired by Kubernetes' Reconciler pattern, very suitable for handling long-running tasks.

6.3 Connected Components Concurrency Optimization

Problem:
During knowledge graph construction, similar entities need merging. Serial processing is slow. Full parallelization has lock contention issues.
Solution:
Use connected components algorithm to divide graph into multiple independent subgraphs:
  • Build entity-relation adjacency list
  • BFS traversal to find all connected components
  • Different components have no connections, can be fully processed in parallel
  • Same component processed serially internally (avoid conflicts)
Results:
  • 2-3x performance improvement
  • Zero lock contention
  • Best results for diverse document collections

6.4 Provider Abstraction Pattern

ApeRAG supports 100+ LLM providers (OpenAI, Claude, Gemini, domestic LLMs, etc.). How to manage uniformly?
Design Approach:
  • Define unified Provider interface
  • Each provider implements its own Provider
  • Adapt through LiteLLM library
This way, switching models only requires config change, no code change. Same pattern also applies to:
  • Embedding Service (supports multiple vector models)
  • Rerank Service (supports multiple reranking models)
  • Web Search Service (DuckDuckGo, JINA, etc.)

6.5 Multimodal Index Support

Besides text, ApeRAG can also handle images:
Vision Index's Two Paths:
  • Pure Visual Vectors: Use multimodal models (like CLIP) to directly generate image vectors
  • Vision to Text: Use VLM to generate image descriptions + OCR to recognize text → text vectorization
Fusion Strategy:
  • Text and visual retrieval results sorted separately
  • Unified scoring through Rerank model
  • Final merged display

7. Summary

ApeRAG achieves production-grade RAG capabilities through the following design:
Core Advantages:
  • Powerful Document Processing: Supports multiple formats, complex layouts, tables and formulas
  • Knowledge Graph Fusion: Not just vector matching, but understanding knowledge relationships
  • Multiple Retrieval Methods: Vector, full-text, and graph working together
  • Async Architecture: Fast response, background processing, good user experience
  • Production-Grade Design: Multi-storage, high concurrency, easy to scale
Technical Innovations:
  • Stateless LightRAG, true multi-tenant support
  • Dual-chain async architecture, API response < 100ms
  • Connected components concurrency optimization, 2-3x faster graph construction
  • Provider abstraction, supports 100+ LLMs
Use Cases:
  • Enterprise knowledge base search
  • Technical documentation Q&A
  • Customer service bots
  • Research paper analysis
  • Any scenario requiring document understanding and intelligent Q&A
The system's design philosophy is: Make complex things simple, make simple things automatic. Users just need to upload documents, everything else is handled automatically by ApeRAG.