A proof of concept that became a blueprint. An AI agent that answers staff questions instantly, escalates what it doesn't know, and learns from every conversation.
A friend of mine manages a large team running examinations at a school in London. His problem was simple: people kept messaging him. After hours. On weekends. During work hours. Asking questions that were already documented.
The questions needed immediate answers. If someone running an exam doesn't know the procedure, the test gets held up. So he had to respond. Every time. Even when the answer was sitting in a handbook somewhere.
I spun up a proof of concept in about 30 minutes. Tested it with him over a call. Then sent him the code so he could run it locally and improve it further.
This was my first real AI agent. Not the overhyped "autonomous agent" that runs models in the cloud and costs a fortune. A practical agent: grounded in a knowledge base, able to escalate when it doesn't know, and smart enough to remember who's asking. It put me on the track of understanding what agents actually should be.
Documentation exists. People don't read it. They ask instead. And they ask the same person, repeatedly, at the worst times.
"What's the procedure for X?" "Where do I find Y?" "What do I do if Z happens?" The same questions, from different people, at all hours. Each one urgent. Each one already answered somewhere.
Exam invigilators can't wait for a reply. Tests are happening. Students are waiting. If the answer doesn't come in seconds, something goes wrong. You can't just "reply later".
Handbooks exist. Procedure guides exist. But finding the right answer takes longer than just asking. And some things aren't documented at all. They're just in someone's head.
Different people need different levels of detail. A new joiner needs step-by-step guidance. An experienced invigilator just needs a quick reminder. One-size answers don't fit all.
Not a chatbot that makes things up. An agent grounded in actual documentation, with a clear escalation path when the answer isn't there.
Workflow orchestration
Language model
Knowledge base storage
Web, Slack, Teams or custom
Slack, Teams, SMS or email
Conversation history
Fast, cheap, and capable. For a knowledge base lookup with clear sources, you don't need the biggest model. You need one that's fast enough to feel instant and cheap enough to run constantly.
He wanted to run this locally. Self-hosted database means no ongoing cloud costs, no data leaving the premises, and full control over what goes in and comes out.
When the agent doesn't know, it asks the designated human via their preferred channel. The human responds. That response gets saved to the knowledge base for next time.
The agent remembers who's asking. Is this person new or experienced? Technically minded or not? The answer style adapts based on past interactions.
A single workflow that handles most questions instantly and learns from the ones it can't.
Someone sends a question through the chat interface. Could be "What's the procedure for late arrivals?" or "Where's the emergency contact list?" The agent gets the question plus the person's conversation history.
The agent's first action is always to search the knowledge base. It's explicitly instructed: do not answer from general knowledge. Only use what's in the documentation. The vector store returns the top 5 most relevant chunks.
If the knowledge base has a good answer, the agent responds immediately with the answer and source. If not, it tells the user it's checking with a human, then triggers escalation. The designated expert gets a notification with the question and context, and their reply goes straight back to the user.
When the human responds via Telegram, that answer gets saved. Next time someone asks the same question, the agent knows. The knowledge base grows with every escalation. Over time, fewer and fewer questions need human input.
Four tools, each with a specific purpose. The agent decides which to use based on the question.
Primary tool. Searches the vector store for relevant documentation. Used first for any procedural question.
For explicit math requests. Room capacity calculations, time conversions, anything numeric.
For complex reasoning. When the agent needs to work through conflicting information or multi-step logic.
Escalation path. Sends the question to Telegram when the knowledge base doesn't have the answer.
Conversation history per user. Remembers past questions and adapts answer style to the person.
Optional capability for general knowledge. Only enabled for certain users with strict limits. Complements the knowledge base when specific context is needed.
This project put me on the track of agents. But not the way most people sell them.
I'm generally opposed to agents as a solution unless you're running models locally. The hype around autonomous agents that run in the cloud, make their own decisions, and cost unpredictable amounts of money? That's not practical for most use cases.
But this kind of agent? This is useful. It's grounded in a specific knowledge base. It knows when it doesn't know. It escalates to a human and learns from the response. It doesn't hallucinate because it's explicitly told: only answer from the documentation. And critically, it handles urgent queries instantly, freeing up expensive human time. Every question resolved by the agent is one less interruption for someone who charges far more per hour than the API call costs.
Hype agents: autonomous, expensive, unpredictable, prone to hallucination.
Practical agents: grounded, bounded, transparent about limitations, learns from human input.
This was a proof of concept from a while back. I'm planning to revisit it with everything I've learnt since. Better prompt engineering. More sophisticated memory. Cleaner architecture. The blueprint is solid. The execution can be even better.
A quick build that revealed bigger patterns about how agents should work.
The agent's system prompt explicitly says: only use the knowledge base. No general knowledge. This prevents hallucination entirely. If it's not documented, the agent says so.
The "I don't know" path is as important as the "here's your answer" path. Clear escalation to a human, with context, makes the whole system trustworthy.
Human answers feeding back into the knowledge base means the system improves automatically. Every escalation makes the next one less likely.
Knowing who's asking and their history transforms generic answers into tailored ones. The same question from a newbie and an expert should get different responses.
A fast, simple answer beats a slow, perfect one. GPT-4o-mini responding in 2 seconds is more useful than GPT-4 responding in 10. For this use case, at least.
30 minutes to build. Solves a real problem. 90% of questions handled without human input. Sometimes the quick and dirty solution is actually the right one.
Documentation nobody reads is worthless. An agent that answers instantly, escalates gracefully, and learns continuously? That's useful.