Deep dive · 01 / 04

Determinism.

Same input, same output. No LLM in the hot path. Every operator has a numeric quality bar that was set before the test ran, not after.

Why this matters

A document pipeline is a trust artifact. Someone signs it, files it, discloses it, or ships it to counsel. If the pipeline produces different output on the same input, the artifact is not a pipeline — it is a guess that happens to be right most of the time. That is fine for consumer demos. It is not fine for a bill of materials, a set of discovery productions, or an audit exhibit.

Rule: no LLMs in the hot path

The toolkit does not call an LLM to redact, stamp, sign, fill, classify, or extract. Every one of those operators is deterministic code: regex rules, checksum validation, overlay compositing, AES cryptography, classical OCR backends, and a small set of pre-trained classifiers whose weights are frozen at release.

Where a model is present (OCR, the page classifier), the model is held out for evaluation on a fixed corpus with a pre-registered quality bar. If the model regresses below its bar in CI, the release is blocked.

Rule: quality bars are set before the test

Each capability ships with a config file that declares its quality dimensions and the numeric bar on each dimension. Bars are checked in to the repo, reviewed in PRs, and are part of the release gate. We do not pick the bar after we see the number.

{
  "capability": "bates_numbering",
  "dimensions": {
    "sequential_integrity":   { "bar": 1.00 },
    "prefix_consistency":     { "bar": 1.00 },
    "padding_width":          { "bar": 1.00 },
    "page_range_application": { "bar": 1.00 }
  },
  "pass_rule": "all"
}

Rule: every dimension must pass

Capabilities ship under an all-dimensions-pass rule. If seven of eight dimensions hit 1.00 and the eighth sits at 0.98, the operator does not ship. We do not average away a miss on a load-bearing property.

What happens when a dimension cannot pass

It gets logged as a known limitation on the homepage limitations list. Examples today: handwriting OCR quality is backend-dependent, and cross-page table joining scored 1.00 on a synthetic corpus but has not been measured on your bank's specific statement template. Both are real open measurements, and both are stated plainly instead of being smoothed over.

Byte-for-byte determinism on the pure paths

Bates numbering, overlay annotations, page operations, compression, and password ops are byte-for-byte deterministic: given the same input file and the same parameters, the output file hashes to the same SHA-256 on every run. That test is part of the suite for each of those operators.

What determinism gives you

Reproducibility. You can regenerate a production from the same inputs years later.
Audit. An auditor can rerun a signed job and bit-compare the output.
Regression safety. A CI diff catches an accidental change in an operator's output before it ships.
Privacy. There is no remote model inference to route your document through.