DocumentIntelligence

StatusTesting
StackClaude Vision, Zod, Supabase

PDF parsing pipeline that extracts structured data from invoices, contracts, and compliance documents.

PDFs are converted to images and sent to Claude Vision for extraction. A Zod schema defines the expected output structure per document type (invoice: vendor, date, line items, total; contract: parties, terms, dates, clauses). Claude extracts the data and returns it as validated JSON. Failed validations trigger a re-extraction with more specific prompting.

Claude VisionZodSupabase

Testing accuracy across 12 document types: invoices (97% accuracy), contracts (89%), compliance reports (91%), receipts (95%), insurance forms (87%), employment contracts (90%), NDAs (93%), purchase orders (96%), bank statements (94%), tax returns (88%), medical records (82%), and lease agreements (85%). The main failure mode is handwritten annotations — Claude Vision struggles with poor handwriting.

Being developed for the AI Compliance Engine (Irvo) and as a standalone integration. Client use case: a property management company processing 500 lease agreements annually — currently takes 2 hours per document, target is 5 minutes with human review.

Accounting firm

Processing 200 invoices per week from different vendors, each with a different format. The pipeline extracts vendor, date, line items, VAT, and total — feeding directly into Xero or QuickBooks.

Legal firm

Reviewing contracts for key clauses (termination, liability, IP ownership). The pipeline highlights relevant sections and flags missing standard clauses — saving 30 minutes per contract review.

Insurance broker

Extracting policy details from renewal documents. Coverage amounts, exclusions, and premium changes are pulled into a comparison spreadsheet automatically.

Want this for your business?

Start a sprint
Back toLab