HomeProjectsCareerSkillsBlog ResumeContact
← Back to All Projects
🎯 86% retrieval accuracy

Enterprise RAG Knowledge System

Qdrant Neo4j Redis Docling LangChain Docker Python
86%Retrieval Accuracy
22msAverage Latency
45+Knowledge Collections
3Database Engines Combined

🔍 Context

Organizational knowledge was scattered across databases, documents, logs, and people's heads. When someone needed an answer about a business process, system configuration, or historical decision, they asked around or dug through files. There was no unified way to query enterprise knowledge programmatically.

The problem was compounded by organizational scale: multiple departments, each with their own documentation conventions (or lack thereof), business rules embedded in code comments and email threads, and tribal knowledge that walked out the door every time someone left the team. Critical information existed, but finding it required knowing who to ask and where to look — knowledge that itself was undocumented.

Previous attempts at knowledge management had failed because they relied on people manually maintaining wikis and document repositories. The moment someone got busy, documentation fell behind, and trust in the knowledge base eroded. Any solution needed to be largely self-maintaining.

⚙️ Approach

Architected a hybrid RAG (Retrieval-Augmented Generation) system combining three complementary data layers, each optimized for different query patterns:

Vector Search Layer (Qdrant): Built 45+ domain-specific collections optimized for semantic similarity search. Each collection is tuned with domain-appropriate chunking strategies — technical documentation uses smaller, precise chunks while business process documents use larger contextual windows. Embeddings are generated using models selected for the specific domain vocabulary.

Graph Knowledge Layer (Neo4j): Mapped entity relationships that vector search alone can't capture: which systems connect to which, which business rules affect which processes, who owns which documentation. This enables traversal queries like "show me everything connected to the invoice reconciliation process" that pure vector search would miss.

Caching & Queue Layer (Redis): Implemented intelligent caching for frequent queries and queue-based processing for ingestion pipelines. Hot queries return in under 5ms from cache, while the queue system ensures document ingestion doesn't block retrieval operations.

Ingestion Pipeline (Docling): Built automated document processing pipelines using Docling for intelligent document parsing — handling PDFs, spreadsheets, code files, and database schema exports. The pipeline extracts structured content, generates embeddings, and updates both vector and graph stores automatically when source documents change.

🚀 Impact

  • 86% retrieval accuracy at 22ms average latency — enterprise knowledge accessible in near-real-time with high precision
  • 45+ knowledge collections spanning operations, finance, sales, technical documentation, HR policies, and system architecture
  • Self-updating pipeline that ingests new documents automatically — eliminating the "stale documentation" problem that killed previous knowledge management attempts
  • Enabled natural-language querying of enterprise data for non-technical stakeholders — sales managers can query system capabilities, finance teams can look up process documentation
  • Reduced new-employee onboarding time for system-specific knowledge acquisition by providing instant, accurate answers to common questions
  • Foundation for AI agent context — the RAG system directly feeds the agentic AI development engine, providing agents with accurate, up-to-date enterprise context

🏗️ Key Technical Decisions

Hybrid Architecture (Vector + Graph) over Pure Vector Search

Pure vector search excels at finding semantically similar content but fails at relationship traversal. By adding Neo4j as a graph layer, the system can answer "what depends on X" and "what's connected to Y" queries that are critical for enterprise knowledge navigation. The two systems complement each other — vector search finds relevant documents, graph search finds related context.

Domain-Specific Collections over Monolithic Index

Instead of one massive vector index, created 45+ specialized collections with domain-tuned chunking strategies, embedding models, and retrieval parameters. A technical API documentation collection uses different chunk sizes and overlap settings than an HR policy collection. This specialization improved retrieval accuracy by ~15% compared to a unified index approach.

Automated Ingestion over Manual Curation

Designed the system to automatically detect and process new documents rather than requiring manual uploads. This was a deliberate choice based on the failure of previous wiki-based knowledge management — any system that depends on humans remembering to update it will eventually fall behind.

Docker-Orchestrated Microservices

Each component (Qdrant, Neo4j, Redis, ingestion workers) runs as an independent Docker service with health checks, resource limits, and automatic restart policies. This isolation means a spike in ingestion load doesn't degrade retrieval performance, and any component can be updated independently.

💡 Lessons Learned

01
Chunking strategy matters more than model selection. Spent significant time testing embedding models but discovered that how you split documents into chunks has a larger impact on retrieval quality than which embedding model you use. Domain-appropriate chunking was the single biggest accuracy lever.
02
Knowledge systems need to be self-maintaining or they die. Every previous knowledge management initiative in the organization had failed because it depended on people manually updating documentation. The automated ingestion pipeline was the design decision that made this system sustainable long-term.
03
Start with retrieval quality, then optimize for speed. The initial system was accurate but slow (~200ms). Optimization through caching, index tuning, and query preprocessing brought latency down to 22ms. If I'd optimized for speed first, I would have compromised on the architectural decisions that enable high accuracy.