Document toolkit · Austin, TX

Lyte Lab

Local-first document tools. Measured. Deterministic. No LLM in the hot path.

Capabilities

12 shipped operators
01
0.96

PDF parse, edit, and redact

Additive overlay architecture. Source pixels preserved, edits composited on top, glyphs under redactions are purged from the content stream.

Overlay edit
02
0.99

OCR with handwriting support

Scanned PDFs to searchable PDFs. Auto-detects scanned vs born-digital. Swappable backend: OcrMac, EasyOCR, Tesseract, or a handwriting model.

Swappable backend
03
5 formats

Format conversion

PDF ↔ CSV ↔ Markdown ↔ HTML ↔ DOCX. Round-trip self-test optional so you can confirm fidelity before shipping a deliverable.

Round-trip
04
1.00

Tables & cross-page joining

Extract every table to one CSV, with a heuristic rejoiner for tables that span page breaks. Useful for bank statements, P&Ls, audit exhibits.

Statement rejoin
05
1.000

Bates numbering

Sequential legal-discovery stamping across a set of PDFs with configurable prefix, start number, and padding. Returns stamped file plus a production log.

Legal discovery
06
1.000

Bulk form fill

One template PDF + one CSV = N filled PDFs. Supports AcroForm fields and coordinate-mode overlays for non-form templates. One row out per row in.

AcroForm + coord
07
28 / 28

PII detection

Names, emails, phone numbers, SSNs, account numbers. Regex rules plus Luhn validation. Zero LLM. Output feeds straight into redact.

Regex + Luhn
08
1.000

Digital signatures

Native tamper-detection signing (5 credits) or Adobe-compatible PAdES via a customer-supplied cert (15 credits). Both verifiable offline.

Native · PAdES
09
AES-256

Password operations

AES-256 encrypt, decrypt, set permissions. Change owner/user passwords, restrict printing and copying, re-encrypt with a new key on request.

Encrypt · decrypt
10
1.000

Overlay annotations

Headers, footers, watermarks. Positioned by page, rotated, or tiled. Applied as an overlay so source pixels and signatures stay intact.

Watermark
11
PDF/A

Compression + PDF/A

Lossless image recompression and optional conversion to PDF/A-2b for archival compliance. Returns before/after byte size in the response.

Archival
12
14/14

Page classifier + batch rename

Drop a mixed folder of invoices, receipts, contracts, and statements. Tool classifies each page, splits, sorts, and renames on a template you supply.

Mixed folder
Pre-registered quality bars, logged test counts

Every operator ships with a numeric quality bar

Each capability has a pre-registered threshold per quality dimension and a test battery. No capability ships until every dimension crosses its bar. Scores are model-free where possible; where a model exists (OCR, classifier), the quality bar is set against a held-out corpus.

Capability Scorecard

Capability Fidelity score Tests Bar Notes
PDF edit (overlay) 0.96 25+0.95Source content stream preserved
OCR 0.99 / 0.81 / 1.00 18+variesThree dimensions: text, layout, handwriting
Bates numbering 1.000 37 1.00Sequential integrity required
PII detector 1.000 60 1.0028 / 28 seeded matches
Bulk form fill 1.000 25 1.00AcroForm + coordinate mode
Table joining 1.00 27 0.90Synthetic corpus; real-world drift expected
Compression + PDF/A 1.00 24 0.956 dimensions passed, 2 skipped (gs optional)
Page classifier 1.000 53 0.8514-document corpus, confusion matrix clean
Overlay annotations 1.000 51 1.009 dimensions
Page operations 1.000 21 1.00Merge / split / rotate / reorder
Bookmark generator 1.000 19 0.907 dimensions
Password operations 1.000 23 1.00AES-256 encrypt / decrypt / permissions
Digital signature (native) 1.000 6 1.00Tamper-detection round-trip

Quality dimensions are capability-specific and pre-registered in each operator's config. Every bar is a pass/fail gate: if any dimension misses, the operator does not ship. Test counts above are the battery sizes at cut; total suite runs in CI on every push.

release: decomposition risk score = 0.0 shippable: true toolkit: 12 operators · 412 tests
What the toolkit does not do

What we are measured against, and what is still outstanding

The capability table above is what we will stand behind. This list is what we will not pretend is done. If one of these matters to your workflow, ask before signing up.

  • No HIPAA / SOC 2 yetOffline by design, but third-party audits are not complete.
  • Docling cold-start ~1 GBThe parse pipeline pulls a layout model on first use.
  • Table joining is synthetic-scoredRejoiner hits 1.00 on our corpus; real-world drift is expected.
  • PAdES requires BYOC certFor Adobe-validated signatures, supply a qualified cert (15 credits / sign).
  • No e-signature workflow UISignatures are cryptographic, not DocuSign-style routing.
Free tier · credit packs · self-host

Free for light operations. Credits for heavy ones. Self-host when you need it.

Light operations (parse, edit, convert, merge, split, diff, search, classify) are free with a per-IP rate limit. Heavy operations (OCR, PII, tables, Bates, bulk-fill, redact, sign) run on credits so you only pay for what you use.

Tier 01

Free

$0
  • Parse, edit, convert, merge, split, diff, search, classify
  • 20 requests / minute per IP
  • Files up to 25 MB
  • No account required
Read the docs
Tier 03

2,000 credits

$30best per-credit rate
  • Same ops as starter
  • Lowest per-credit rate
  • Credits never expire
  • Bearer-token API access
Buy credits

Enterprise self-host

Docker + Helm deploy on your cluster. OIDC / SAML, role-based access, full audit log, air-gapped mode. Annual seat licensing.

How self-host works →

$12,000 / yr
5 seats · annual · signed license file
$25,000 / yr
Unlimited seats · annual · air-gapped install supported

See the full breakdown on the pricing page.

Austin, TX.