Scaffolding Matters More Than the Schema
A design and case-study report on Targum, a controlled-vocabulary translation engine for esoteric primary sources
Abstract
We describe Targum, a controlled-vocabulary translation engine for the esoteric and contemplative primary-source corpus, and report case-study findings from three benchmark passages across two scripts and three traditions. Targum is a layered pipeline (reference resolution, multilingual morphology, hard-constraint glossary, retrieval over a curated scholarly corpus, hermeneutic frame controllers, schema-validated generation, drift audit, registry check, citation verification, and editor sign-off) built over the editorial infrastructure of the Hekhal cross-tradition reference (hekhal.org). The architectural design separates the engine (deterministic pipeline) from the scaffolding (per-corpus glossaries, frame controllers, scholarly summaries, and lexicon registries that the engine consumes at generation time).
Across three case-study benchmarks (Ibn Arabi Tarjuman al-Ashwaq XI.13–15; the kuntu kanzan Akbarian hadith; Pseudo-Dionysius Mystical Theology I.1) we find that the engine's distinctive value over standard public-domain translation is realized when the corpus scaffolding is built — and that schema enforcement alone, without a post-LLM existence check against actual infrastructure on disk, produces a failure mode we call vacuous compliance: structurally valid output referencing infrastructure that does not exist. Independent adversarial review of one scaffolded output judged it "editor-grade output... suitable for editor sign-off."
We argue for a design discipline we call registry-grounded translation: every schema-shape commitment must be paired with a post-LLM existence check against the actual infrastructure (frame controllers on disk, lexicon pages on the public site, glossary revisions in version control, citations in a verified manifest). We publish the benchmark specifications, the full run audit packages, the adversarial-review transcript, and a verified-citation manifest excerpt as a public reproducibility surface; the engine source code and the corpus-specific scaffolding remain proprietary editorial work and are available to the journal peer-review process under standard reviewer-disclosure terms.
Keywords: machine translation, digital humanities, esoteric studies, controlled vocabulary, retrieval-augmented generation, Pseudo-Dionysius, Ibn Arabi, large language models, hermeneutic theory.
Key contributions
- A controlled-vocabulary translation engine, Targum, layered over the editorial infrastructure of an active cross-tradition esoteric reference. The architecture is described in the paper at the level required to replicate the design in an independent implementation.
- The articulation and empirical defense of a design discipline we call registry-grounded translation: schema-shape commitments must be paired with post-LLM existence checks against the actual infrastructure on disk. Formalized as two new pipeline layers (Layer 6.5 registry check, Layer 6.6 citation verification) and shown to catch a class of failure (vacuous compliance) that schema enforcement alone does not.
- Three case-study benchmarks across two scripts and three traditions (Akbarian Sufi poetry, Akbarian doctrinal hadith, Christian apophatic prose), with per-run audit packages preserving every artifact necessary to inspect-verify the empirical claims. One scaffolded output judged "editor-grade output... suitable for editor sign-off" by an independent cold-context reviewer.
- A practical contribution: the demonstration that the post-LLM verification layer reframes the open-source presumption that has dominated digital-humanities methodology work. Reviewer-verifiable claims do not require open-source engine code; they require open audit packages. The audit packages for this paper are linked below.
Results matrix
Four post-LLM verification layers: schema validation, drift audit, registry check, citation verification. Per-run incidents reported below. V / TM / U = verified-against-printed-source / training-memory / unverified.
| Run | Schema | Drift | Registry | Citations (V / TM / U) | PD-comparison |
|---|---|---|---|---|---|
| Test A re-run (post-fix) Ibn Arabi, Tarjuman XI.13–15 | pass | 5 incidents | 0 | 0 / 4 / 0 | chunk not in PD index |
| Test B re-run (post-fix) kuntu kanzan hadith | pass (1 retry) | 3 incidents | 2 incidents | 0 / 7 / 0 | no PD English exists |
| Test C pre-scaffolding Pseudo-Dionysius MT I.1 | pass | clean (vacuous) | 3 incidents | 0 / 3 / 0 | no PD index coverage |
| Test C post-scaffolding Pseudo-Dionysius MT I.1 | pass | 2 incidents | 0 | 0 / 5 / 0 | 1 entry (Rolt, divergent, J=0.14) |
The empirical claim of the paper is in the contrast between Test C pre-scaffolding and Test C post-scaffolding: schema enforcement alone produces a vacuously-compliant output that references infrastructure that does not exist; the registry-grounded post-LLM verification catches the failure once the scaffolding is built. The paper develops this finding in §6 of the PDF.
Audit packages — the empirical evidence
Inspect-verifying the paper's empirical claims requires the run audit packages, not the engine source. The bundle below contains the assembled prompts, validated TranslationOutput JSON, drift / registry / citation reports, and (for Test C post-scaffolding) the cold-context adversarial-review transcript. This is the “open audit, closed engine” stance articulated in §9.1 of the paper.
- Test A — Ibn Arabi, Tarjuman al-Ashwaq XI.13–15. Akbarian Sufi poetry; zahir-batin frame; per-corpus glossary calibrated. (audit package)
- Test B — the kuntu kanzan hadith. Doctrinally foundational for Akbarian metaphysics but contested as hadith; the engine's apparatus surfaces the attribution problem. (audit package)
- Test C — Pseudo-Dionysius, Mystical Theology I.1 prayer. Original adversarial run (2026-05-08, pre-scaffolding) and re-run (2026-05-16, post-scaffolding) side-by-side. (audit package)
- Adversarial-review transcript. Cold-context reviewer report on Test C post-scaffolding, conducted by a fresh Anthropic Claude session with no contextual knowledge of the experimental hypothesis or the project. (transcript)
- Verified-citation manifest excerpt. The Phase-1 state of the editor-verified citation discipline that backs Layer 6.6. (manifest excerpt)
The Targum engine source code, the controlled glossaries, the frame controllers, and the curated scholarly-corpus seeds are not part of the public artifact set. They are proprietary editorial work of the Hekhal Project and are available to the journal’s peer-review process under standard reviewer-disclosure terms by request ([email protected]).
Open audit, closed engine
The design decision to publish full run audit packages while keeping the engine source closed is a deliberate departure from the open-source presumption that runs through much of the digital-humanities methodology literature. Reviewer-verifiable empirical claims do not require open-source engine code; they require open audit packages that fix every artifact upstream of the engine’s decisions (the spec, the assembled prompt, the model identifier, the schema, the retrieved scholarly context) and every artifact downstream of them (the validated output, the drift / registry / citation reports, the PD-comparison entries, the editor-review state). A reader can audit the engine’s compliance with the discipline contract without inspecting the engine’s code.
The engine, the glossaries, the frame controllers, and the scholarly-corpus seeds remain closed because they constitute, in aggregate, an editorial moat that the Hekhal Project’s coverage strategy depends on for long-term sustainability. The Hekhal Project is a commercial entity (Lattice DBA, EIN issued 2026-04-02) and the closed-source positioning is what allows the project to commit to ten-year coverage of the long-tail mystical and contemplative corpus rather than ship a one-time methodology demonstration. The paper’s reproducibility surface is built to support peer review, replication of the design in independent implementations, and good-faith inspection of the empirical claims — not to enable cloning of the editorial moat.
How to cite
Couey, Vincent W. 2026. “Scaffolding Matters More Than the Schema: A Design and Case Study Report on Targum, a Controlled-Vocabulary Translation Engine for Esoteric Primary Sources.” Submitted to Aries: Journal for the Study of Western Esotericism. Preprint: hekhal.org/targum-experiments/scaffolding-matters-paper.
Author
Vincent W. Couey is an independent researcher working at the intersection of digital humanities, esoteric primary-source translation, and small-scale computational philology. He founded the Hekhal Project in 2026 to produce a serious, open, cross-tradition reference for the mystical and contemplative primary-source corpus; the Targum translation engine described in this paper is the project’s translation arm. His other current research includes Substrate Geometry (a computational physics program on rigid-body equilibria and shape-classification dynamics) and a computational toxicology program testing architecture-specific QSAR failures on psychedelic-class compounds (first preprint on ChemRxiv; OSF DOI 10.17605/OSF.IO/UWVX4). He works from Toledo, Ohio.
Correspondence: [email protected]
Project: hekhal.org
Targum experiments: hekhal.org/targum-experiments