Spec Book Scanner — Architecture and Design Decisions
How the MEP Spec Book Scanner works: why the system prompt is load-bearing, why refused claims are first-class output, and what the tool deliberately does not do.
Published: May 2026 · Last reviewed: May 2026
The MEP Spec Book Scanner is a tool for MEP estimators and construction managers: paste a section of a specification, get a structured list of MEP-relevant clauses with verbatim citations. The design looks simple. The choices that make it trustworthy are not.
This essay documents the architecture decisions — what the tool does, what it refuses to do, and why those refusals are as important as the findings.
The system prompt is the policy engine
Most AI tools treat the system prompt as setup — a few sentences of persona configuration. In the Spec Book Scanner, the system prompt is the entire policy framework. It defines what counts as a valid finding, what gets refused, how citations work, and what the output schema must look like. It is the most load-bearing component in the stack, more important than the choice of model.
The prompt enforces four non-negotiable rules:
- Every finding requires a verbatim excerpt. The model must copy the source text character-for-character into the excerpt field. Paraphrase is not permitted. A summary is not permitted. If the model cannot quote directly, it cannot report the finding.
- Section numbers must match the source. The model is instructed to copy the section number exactly as it appears in the text — not to normalize, infer, or interpolate from its training knowledge of CSI MasterFormat divisions.
- No outside knowledge. The model is explicitly told not to use anything it knows about MEP specifications, codes, or standards beyond what is in the provided text. A finding is only a finding if it is grounded in the excerpt.
- Refuse rather than invent. When the model cannot quote support for a claim it would otherwise make, it must add that claim to the refuse list with a reason code. The refuse list is not an error log — it is required output.
These rules are in tension with how language models are trained. Models are trained to be helpful, to fill gaps, to produce complete-looking answers. A model that refuses to report something it "knows" feels broken from a UX perspective. The system prompt has to actively work against the model's defaults — which is why writing it carefully is not optional.
Why citations are non-negotiable
An MEP estimator using this tool will act on its output. They may log a section as affecting scope. They may flag a clause for RFI. They may build a bid narrative from the findings. If a finding is fabricated — if the model reported something that isn't in the spec — the downstream consequence is real: a scope miss, a bid error, a claim dispute.
The citation requirement is not a quality-of-life feature. It is the mechanism by which the user can verify every finding independently. Each reported finding comes with the exact text that triggered it. The user can search their spec for that text and confirm or reject the finding. A finding without a citation cannot be independently verified. A finding that cannot be verified cannot be trusted. A finding that cannot be trusted is useless for professional use.
Cross-reference conflicts are held to an even stricter standard: both sides of a conflict must be quoted. The tool does not report inferred conflicts or "this probably conflicts with" findings. If both excerpts cannot be quoted from the provided text, the potential conflict goes to the refuse list.
Refuse-sets are first-class output
The refuse list is the most honest part of the tool, and also the most counterintuitive to design for. The instinct — for the builder and for the model — is to minimize the refuse list. Fewer refusals looks like better performance. It is not.
A shorter, fully-cited findings list is more useful than a longer findings list that mixes real citations with fabricated ones. The user who reads a findings list that includes fabricated claims loses the ability to trust any of the findings — because they can't tell which ones are real. The user who reads a findings list where every claim is cited, and a separate refuse list where uncited claims are logged, can act confidently on the findings and know exactly where the tool's coverage ended.
This is the core thesis of the Hive's anti-fabrication architecture: a tool that refuses 5% of the time and is right the other 95% is more useful than a tool that answers 100% of the time and is right 90%. The 5% that refuses prevents the 5% that would have been wrong, and the prevented errors are more dangerous than the prevented refusals.
The refuse list also tells the estimator what to do next. A refusal with reason "no source found" tells them that the model looked for this and didn't find it — worth a manual check. A refusal with reason "outside scope" tells them this tool doesn't handle that class of question (dollar estimation, for example) and they need a different tool. A refusal with reason "unreadable text" tells them the input had formatting or encoding problems and they should paste a cleaner excerpt.
What Iteration 1 does not do
This is the first shipped iteration. It is deliberately scoped. The known limitations are documented here, not buried.
- No PDF upload. The tool accepts pasted text only, up to 50,000 characters. Full PDF ingestion — with page-level citations tied to PDF page numbers — is the most important feature for Iteration 2. The paste-text mode covers the core architecture demonstration; it does not cover a typical 1,000-page project specification.
- Single-mind, single-pass. The production architecture described in the build spec calls for a multi-mind verification chain: a primary extraction mind, a verification mind that independently confirms citations, and a cross-reference mind. Iteration 1 uses a single model pass constrained by the citation rules in the system prompt. That is stricter than most AI tools, but it is not the full verification chain.
- No scope summary generation. The tool returns structured findings. It does not generate a one-page scope summary narrative from those findings. That feature requires the full verification chain to be in place so the summary can only draw from verified findings.
- No rate limiting or session management. The Iteration 1 deployment is open. Rate limiting by IP is planned before the tool is featured in any high-traffic context.
- No practitioner advisor sign-off. The build spec requires practitioner advisor review before a public launch announcement. This page and the tool are live, but a formal advisor review against real spec books has not yet been completed. The tool is available for evaluation; the formal sign-off process is in progress.
What comes next
Iteration 2 will add PDF upload with page-level citations — each finding linked to its PDF page number. That is the feature that closes the gap between paste-text demo and production tool. The architecture for it is documented in the build spec: PDF ingestion via pdftotext (poppler), chunk-and-embed for cross-reference detection, progress indication for the scan-in-progress state.
After PDF upload: the multi-mind verification chain, where a second independent model pass confirms that each citation is present at the claimed location in the document. That is the step that catches the failure mode where a single-pass model quotes text confidently but references the wrong section or misremembers where it appeared.
The connection to the broader architecture
The Spec Book Scanner is one instance of the architecture described in Building an AI Hive Mind That Actually Cares. The specific commitments — primary-source citations, refuse-sets as first-class, multi-mind verification — are not features of this particular tool. They are the consistent architecture across every tool The Hive ships. The scanner makes them visible in a context — specification analysis — where the consequences of fabrication are large enough that the architecture has to be explained, not just implemented.
If you are an MEP estimator or CM professional and you scan a real spec excerpt and find a finding that is wrong — wrong citation, wrong section classification, fabricated claim — that is worth knowing about. Connect via LinkedIn. Every calibration failure becomes a prompt refinement or an architecture change.
For the estimator's-perspective companion piece — this scanner alongside the Sub-Bid Scope-Gap Detector and the Healthcare MEP Code-Cross-Reference Engine as the three capabilities that map to where MEP estimating teams inside CM firms actually lose time — see AI for MEP Estimating Teams Inside CM Firms — Three Capabilities That Actually Save Hours. For the free public preview of this scanner, see the scanner preview.
Dan Cohen
Founder, The Hive
Connect via LinkedIn
Wilmington, Delaware