KahaPilot: Private documentation layer powering AI assistants and coding agents.
Designed & Built · 2025 – Present
A documentation-grounded AI coding assistant for Fenergo (Fen-X) integration work, built end-to-end as an internal tool for Kaha Management. LLMs hallucinate on specialised vendor APIs because their training data rarely covers them. KahaPilot fixes this by hosting a private semantic search index of developer documentation, curated internal stackoverflow knowledge, integration patterns & templates, and more, locally on each developer's machine, and exposes it to Claude Code and Claude Desktop over the Model Context Protocol.
This system is a component in functional knowledge uses as well as agentic coding systems.
What it does
When a developer asks Claude to write Fenergo code, Claude searches the local index first and grounds every response in authoritative documentation. The corpus covers ~2,900 Developer Hub pages, hundreds of Swagger / OpenAPI specifications, and ~5,000 quality-curated answers from the internal Stack Overflow for Teams instance, the most valuable institutional knowledge a new joiner could get. Retrieval runs entirely on the developer's laptop, with no external API calls and no data leaving the organisation.
Functional consultants use it the same way through Claude Desktop. When they need a specific piece of information (internal company knowledge or a Fen-X documentation lookup), Claude answers from the grounded corpus and surfaces direct links to the relevant sources alongside the response, for verification and further reading.
Architecture
A small Node.js MCP wrapper fronts grounded-docs, an extended build of the open-source arabold/docs-mcp-server. The wrapper re-exposes its tools with Fenergo-scoped descriptions, enriches results with canonical source URLs, and emits OpenInference TOOL spans asynchronously so telemetry never blocks retrieval. The whole stack ships as a single Docker Compose project that a developer brings up with one command. Claude Code Hooks are also used to improve MCP tool calls in CC uses.
Indexing & distribution
Documentation is chunked along Markdown structure (headings, code fences, table boundaries) so each chunk holds one self-contained idea: a single endpoint, a concept, a Q&A pair. Chunks are embedded locally with snowflake-arctic-embed2, a model that performs strongly on technical and code retrieval, and persisted into a SQLite vector store. The pre-indexed store ships as a compressed archive in Git LFS, and a scheduled job rebuilds it weekly from the latest Fenergo documentation, Stack Overflow exports, and Swagger specs. Developer machines simply git pull to refresh their embeddings, with first grounded query under five minutes from a clean clone, and no local re-indexing required.
Telemetry & observability
A companion Phoenix OSS deployment captures every MCP tool call, giving engineering leads visibility into which docs are retrieved, retrieval latency, token consumption, and per-user activity. Phoenix is hosted by a small Aspire project in C# and deployed to Azure Container Apps for centralised, team-wide telemetry. Phoenix is also surfaced back into Claude Desktop as an MCP server, so leads can query traces in natural language.