Two Answers, Identical on Screen
A technician needs the torque specification for a specific fastener on a wing panel access door. He opens an AI assistant and types the question. The system returns a confident, well-formatted answer: 35 Nm, with a note about thread engagement requirements.
He asks the same question to a different AI system. That one returns 28 Nm, equally confident, equally well-formatted, with a note about lubrication requirements that the first system didn't mention.
Both answers look identical in form. Both are presented with the same typographic confidence. One is correct. One is hallucinated — synthesized from training data that included maintenance documentation for a different aircraft variant, a different fastener class, or a different revision of the same document. There is nothing in the presentation of either answer that tells the technician which is which.
In consumer AI applications, this problem is an annoyance. In aviation maintenance, it is a different category of problem entirely.
An incorrect torque value applied to a structural fastener is not discovered immediately. It becomes discoverable during the next scheduled inspection — or during a failure event. The distance between the cause (wrong AI answer) and the effect (discoverable consequence) is measured in flight hours, not seconds. This is what makes AI hallucination in maintenance categorically different from hallucination in other applications.
What AI Hallucination Actually Is
Hallucination is not a bug in AI systems. It is a fundamental behavior of the architecture underlying most AI language models. Understanding why it happens is essential to understanding why some AI tools are structurally unsuitable for aviation maintenance environments.
Large language models generate text by predicting the most statistically probable next token given the preceding context. They are trained on vast quantities of text data to become very good at this prediction. The result is output that is fluent, coherent, and confident — regardless of whether the underlying factual content is accurate.
The model does not know what it knows and what it doesn't. It has no mechanism for uncertainty about factual claims that are not explicitly contradicted by its training data. When asked a question whose correct answer is not in its training data, it does not say "I don't know." It generates a plausible-sounding answer based on related patterns in its training corpus.
The model predicts tokens. It does not verify facts. These are fundamentally different operations, and language models only do the first.
Why Aviation Maintenance Is Especially Vulnerable
Every application domain has some tolerance for AI hallucination. In consumer search, a hallucinated restaurant recommendation causes a wasted evening. In legal research, a hallucinated case citation causes professional embarrassment. The consequences are proportional to the stakes of the domain.
Aviation maintenance has some of the highest stakes of any technical domain, combined with some of the characteristics that make hallucination most dangerous:
Maintenance procedures involve specific numerical values — torque specs, clearances, fluid quantities, pressure settings — where a plausible-sounding wrong number causes the same physical result as a deliberately wrong number. The AI's confidence in its hallucinated value does not change the physical consequence of acting on it.
Unlike software bugs that produce immediate errors, maintenance errors often manifest at a distance from their cause. A fastener under-torqued by 7 Nm may function normally for hundreds of flight cycles before the failure mode it enables becomes observable. Attribution is difficult. Learning from failure is slow.
Aircraft maintenance data is revised frequently. An AI trained on a corpus that included AMM Revision 12 may return values that were superseded in Revision 14. The answer is not fabricated — it was accurate at some point. But acting on a superseded value is functionally equivalent to acting on a hallucinated one.
In aviation, the distribution of outcomes for technical errors is not symmetrical. The majority of wrong answers have no observable consequence in normal operation. A small proportion result in failures whose consequences are catastrophic. Risk models that rely on average-case outcomes systematically underestimate the tail risk.
The Architectural Solutions — and Their Limits
The primary architectural response to hallucination in high-stakes applications is Retrieval-Augmented Generation (RAG). Instead of relying on a model's training data to answer questions, a RAG system retrieves relevant passages from a defined document corpus and grounds the model's response in those passages. The model summarizes and presents retrieved content rather than generating content from statistical patterns.
RAG significantly reduces hallucination compared to pure LLM inference. But RAG alone is not sufficient for aviation maintenance environments. The additional requirements are:
- Exact source citation at the page and section level Knowing that an answer "came from the AMM" is not sufficient. The answer must be traceable to a specific document, a specific revision, a specific section, and a specific page. Without this level of precision, the technician cannot verify the answer against the source — and EASA NPA 2025-07 cannot be satisfied.
- Hard corpus boundaries — no external data The retrieval corpus must be defined, bounded, and controlled. A system that supplements retrieval with internet search, general knowledge bases, or other external sources introduces hallucination risk at the boundary between corpus content and external content. The boundary must be enforced architecturally, not by policy.
- Revision control integration The corpus must reflect the current approved revision of every document it contains. A RAG system operating on a corpus that includes both Revision 12 and Revision 14 of the same document may retrieve content from either, without surfacing the revision conflict to the technician. Revision control is not optional infrastructure — it is a core safety requirement.
- Explicit out-of-scope handling When the question cannot be answered from the approved corpus, the system must say so — clearly and specifically. A system that generates a plausible-sounding answer when the corpus doesn't contain a reliable answer is performing hallucination from a position of apparent authority. The correct behavior is refusal to answer, with an explanation of why.
What to Ask Your AI Vendor
Before deploying any AI tool in a maintenance environment, four questions must be answered to your satisfaction. Vague answers are not acceptable answers.
Document name, revision number, section, page number — every time, for every answer. Not "it's sourced from your knowledge base." Not "we use RAG." The specific citation, displayed alongside the answer, verifiable by looking at the source document. If the vendor cannot demonstrate this in a live evaluation, the tool does not meet the standard.
The correct answer is: the system explicitly declines to answer and tells the user why. If the vendor demonstrates a fallback to general knowledge, web search, or a hedged response that still contains a specific value, the tool has an undefined boundary — and an undefined boundary is a hallucination risk.
What is the process when a document is revised? How does the system ensure that the new revision supersedes the old one in retrieval? Is it possible to retrieve content from a superseded revision? Can you demonstrate revision control in a live environment using two different revisions of the same document?
Every query and every response must be logged with timestamp and user attribution. This is not optional for EASA NPA 2025-07 compliance. The log must be exportable in a format usable by your Quality Management System. Determine data residency, retention period, and access controls before deployment.
DokPath and the Hallucination Problem
DokPath was built as a RAG system that operates exclusively within the organization's approved documentation corpus. There is no external knowledge source. There is no fallback to general training data. The corpus is defined by the organization and controlled by revision — when a new document revision is uploaded, it supersedes the previous one in the retrieval index.
Every answer includes an exact citation: document name, revision number, section, and page number. The citation is not a link to a document — it is the precise location within the document where the answer was found, so the technician can verify it without conducting a new search.
When a question cannot be answered from the approved corpus, DokPath says so. It does not generate a plausible response. It does not hedge with partial information. It states that the question falls outside the approved documentation and directs the technician to the appropriate escalation path.
This behavior is not a feature that was added after the fact. It is the design. The hallucination risk is addressed at the architecture level — not through prompt engineering, not through output filtering, not through policy. The system is structurally incapable of generating an answer that is not grounded in your approved corpus.
Request a technical evaluation