Lyte Lab
Local-first document tools. Measured. Deterministic. No LLM in the hot path.
Capabilities
PDF parse, edit, and redact
Additive overlay architecture. Source pixels preserved, edits composited on top, glyphs under redactions are purged from the content stream.
Overlay editOCR with handwriting support
Scanned PDFs to searchable PDFs. Auto-detects scanned vs born-digital. Swappable backend: OcrMac, EasyOCR, Tesseract, or a handwriting model.
Swappable backendFormat conversion
PDF ↔ CSV ↔ Markdown ↔ HTML ↔ DOCX. Round-trip self-test optional so you can confirm fidelity before shipping a deliverable.
Round-tripTables & cross-page joining
Extract every table to one CSV, with a heuristic rejoiner for tables that span page breaks. Useful for bank statements, P&Ls, audit exhibits.
Statement rejoinBates numbering
Sequential legal-discovery stamping across a set of PDFs with configurable prefix, start number, and padding. Returns stamped file plus a production log.
Legal discoveryBulk form fill
One template PDF + one CSV = N filled PDFs. Supports AcroForm fields and coordinate-mode overlays for non-form templates. One row out per row in.
AcroForm + coordPII detection
Names, emails, phone numbers, SSNs, account numbers. Regex rules plus Luhn validation. Zero LLM. Output feeds straight into redact.
Regex + LuhnDigital signatures
Native tamper-detection signing (5 credits) or Adobe-compatible PAdES via a customer-supplied cert (15 credits). Both verifiable offline.
Native · PAdESPassword operations
AES-256 encrypt, decrypt, set permissions. Change owner/user passwords, restrict printing and copying, re-encrypt with a new key on request.
Encrypt · decryptOverlay annotations
Headers, footers, watermarks. Positioned by page, rotated, or tiled. Applied as an overlay so source pixels and signatures stay intact.
WatermarkCompression + PDF/A
Lossless image recompression and optional conversion to PDF/A-2b for archival compliance. Returns before/after byte size in the response.
ArchivalPage classifier + batch rename
Drop a mixed folder of invoices, receipts, contracts, and statements. Tool classifies each page, splits, sorts, and renames on a template you supply.
Mixed folderEvery operator ships with a numeric quality bar
Each capability has a pre-registered threshold per quality dimension and a test battery. No capability ships until every dimension crosses its bar. Scores are model-free where possible; where a model exists (OCR, classifier), the quality bar is set against a held-out corpus.
Capability Scorecard
| Capability | Fidelity score | Tests | Bar | Notes |
|---|---|---|---|---|
| PDF edit (overlay) | 0.96 | 25+ | 0.95 | Source content stream preserved |
| OCR | 0.99 / 0.81 / 1.00 | 18+ | varies | Three dimensions: text, layout, handwriting |
| Bates numbering | 1.000 | 37 | 1.00 | Sequential integrity required |
| PII detector | 1.000 | 60 | 1.00 | 28 / 28 seeded matches |
| Bulk form fill | 1.000 | 25 | 1.00 | AcroForm + coordinate mode |
| Table joining | 1.00 | 27 | 0.90 | Synthetic corpus; real-world drift expected |
| Compression + PDF/A | 1.00 | 24 | 0.95 | 6 dimensions passed, 2 skipped (gs optional) |
| Page classifier | 1.000 | 53 | 0.85 | 14-document corpus, confusion matrix clean |
| Overlay annotations | 1.000 | 51 | 1.00 | 9 dimensions |
| Page operations | 1.000 | 21 | 1.00 | Merge / split / rotate / reorder |
| Bookmark generator | 1.000 | 19 | 0.90 | 7 dimensions |
| Password operations | 1.000 | 23 | 1.00 | AES-256 encrypt / decrypt / permissions |
| Digital signature (native) | 1.000 | 6 | 1.00 | Tamper-detection round-trip |
Quality dimensions are capability-specific and pre-registered in each operator's config. Every bar is a pass/fail gate: if any dimension misses, the operator does not ship. Test counts above are the battery sizes at cut; total suite runs in CI on every push.
What we are measured against, and what is still outstanding
The capability table above is what we will stand behind. This list is what we will not pretend is done. If one of these matters to your workflow, ask before signing up.
- No HIPAA / SOC 2 yetOffline by design, but third-party audits are not complete.
- Docling cold-start ~1 GBThe parse pipeline pulls a layout model on first use.
- Table joining is synthetic-scoredRejoiner hits 1.00 on our corpus; real-world drift is expected.
- PAdES requires BYOC certFor Adobe-validated signatures, supply a qualified cert (15 credits / sign).
- No e-signature workflow UISignatures are cryptographic, not DocuSign-style routing.
Free for light operations. Credits for heavy ones. Self-host when you need it.
Light operations (parse, edit, convert, merge, split, diff, search, classify) are free with a per-IP rate limit. Heavy operations (OCR, PII, tables, Bates, bulk-fill, redact, sign) run on credits so you only pay for what you use.
Free
- Parse, edit, convert, merge, split, diff, search, classify
- 20 requests / minute per IP
- Files up to 25 MB
- No account required
500 credits
- OCR, PII, tables, Bates, bulk fill, redact, sign
- ~1 credit per page on most ops
- Credits never expire
- Priority queue for heavy jobs
2,000 credits
- Same ops as starter
- Lowest per-credit rate
- Credits never expire
- Bearer-token API access
Enterprise self-host
Docker + Helm deploy on your cluster. OIDC / SAML, role-based access, full audit log, air-gapped mode. Annual seat licensing.
See the full breakdown on the pricing page.
Austin, TX.