Case Studies › Document Processing

Document Processing Pipeline

How a 30-person operations team replaced manual invoice handling with a local AI pipeline.

30-person operations team | Mac Studio M3 Ultra 512GB | Hermes document pipeline

The problem

A growing operations team receives 50-200 business documents per day — invoices, purchase orders, renewal notices, contracts, and quotes. They arrive as PDF attachments via email, shared drives, and portal downloads.

Step	Who does it	Time per document	Daily volume
Open and read the PDF	Admin assistant	2-3 min	50-200
Identify document type	Admin assistant	30 sec	50-200
Extract key fields (invoice number, amounts, dates, parties)	Data entry clerk	3-5 min	50-200
Check for missing or inconsistent information	Senior admin	2-3 min	50-200
File in correct folder, log in spreadsheet	Admin assistant	1-2 min	50-200
Route for approval	Team lead	1 min	20-50

At 100 documents/day, that's 10-14 hours of human time per day spent on document triage and data extraction. Not analysis. Not decision-making. Just reading, classifying, and re-keying information that's already written down.

The cost:

1.5-2 FTE just for document processing
24-48 hour lag between receiving a document and acting on it
Human error rate of 3-8% on data entry (wrong amounts, missed fields, misfiled documents)
Zero audit trail — once it's in the spreadsheet, the original PDF is "somewhere in the shared drive"

What they tried first:

OpenAI API for document extraction. Worked technically, but the company's information security policy changed: financial documents can't leave the building. API bills were also climbing past £1,800/month.
Off-the-shelf OCR software. Read the text but didn't understand document structure. Could extract "Total: £14,400" but couldn't distinguish between an invoice total, a quote estimate, or a contract value.
Hiring more admin staff. Possible but expensive, and the work is repetitive — not what skilled operations people should be doing.

The Foundry setup

Foundry was installed on a Mac Studio (M3 Ultra, 512GB RAM) already in the office. The machine was being used for video editing — it had the capacity but wasn't doing anything AI-related.

What was configured:

Local model — a 30B-parameter model running via llama.cpp, optimised for document understanding. Runs entirely on-device. No document ever leaves the Mac Studio.
Hermes document pipeline — a watched-folder workflow: documents dropped into a secure intake folder, system classifies each document, extracts structured fields, flags missing or inconsistent information, preserves the original PDF untouched alongside the extracted data, all outputs marked "requires human review" before action.
Observability dashboard — llm_stats shows model health and memory usage, documents processed/queued/flagged, processing time per document, any errors or anomalies.

What was NOT configured: No outbound internet access for document processing. No automatic payments, approvals, or system-of-record updates. No cloud API calls — everything runs locally.

What it looks like running

Before: Document arrives at 9:07 AM

An invoice lands in the intake folder. It's a 3-page PDF from Acme Marine Ltd — an invoice for managed local inference setup and workflow integration.

At 9:07:03 AM — Foundry picks it up
The Hermes pipeline detects the new file, assigns a document ID, and queues it for processing.

At 9:07:05 AM — Classification complete

Document ID: 2e8aeb98-01df-4af7-9564-6bc9af91c6ed Document type: invoice Source: watched folder Status: processing

At 9:07:08 AM — Field extraction complete

Field	Extracted value
Customer	Acme Marine Ltd
Customer contact	Jane Smith, jane@acme.test
Invoice number	INV-2026-001
Purchase order	PO-77
Project	Foundry Pilot (FND-001)
Subtotal	£12,000
Tax	£2,400
Total	£14,400
Issue date	2026-04-29
Due date	2026-05-29
Payment terms	Net 30
Line item 1	Managed local inference setup — £9,000
Line item 2	Hermes workflow integration — £3,000

At 9:07:09 AM — Consistency check
The system cross-references extracted fields: Invoice total matches subtotal + tax ✅, Due date is 30 days from issue date (matches payment terms) ✅, Purchase order number present ✅, No missing required fields ✅

At 9:07:10 AM — Filed and logged
Original PDF preserved with file hash for integrity, extracted data saved as structured JSON, entry logged in the pipeline database, document status: awaiting approval.

Total processing time: 7 seconds. A human reviewer sees the extracted data and original PDF side by side, confirms accuracy, and approves. That takes 15-20 seconds — skimming, not reading from scratch.

The numbers

Metric	Before (manual)	After (Foundry)	Change
Time per document	8-13 min	20-30 sec (review only)	95% reduction
Documents/day capacity	100-120	500+	5x throughput
Processing lag	24-48 hours	Under 1 minute	Instant
Data entry errors	3-8%	<0.5% (model reads, human confirms)	90%+ reduction
FTE required	1.5-2.0	0.3 (review queue only)	1.2-1.7 FTE freed
Monthly API cost	£1,800 (OpenAI)	£0 (local)	£21,600/year saved
Audit trail	None (spreadsheet + shared drive)	Full provenance (original hash, extraction log, review approval)	Complete
Data leaves building?	Yes (OpenAI API)	No (local only)	Compliant

Annual savings: £21,600 in API costs + £35,000-50,000 in freed staff time = £56,000-71,600/year.

Hardware cost: £0 (existing Mac Studio). Foundry setup: £999 + £99/month = £2,187 first year.

ROI: 25-32x in year one.

What stayed cloud

Not everything moved local. The team still uses cloud services for:

Email delivery — documents arrive via email, processed locally after download
Cloud storage backup — encrypted backups of processed data (not the processing itself)
Web search and research — when the team needs to look something up, that still goes to cloud APIs
Large model inference for complex reasoning — occasional tasks that need a frontier model still use OpenAI, but the volume dropped 90%+

The point isn't "everything local." It's "the right workloads local, with a clear line between what stays cloud and what doesn't."

What it doesn't do

Does not make decisions. It extracts, classifies, and flags. A human approves every action.
Does not send emails or update systems of record automatically. All outputs are drafts for human review.
Does not handle every document type perfectly. Complex multi-page contracts with unusual structures may need manual review. The system flags these rather than guessing.
Does not replace the operations team. It removes the data-entry grind so they can focus on exceptions, relationships, and actual operations work.

The team's experience

"Before Foundry, I spent my morning opening invoices. Now I spend my morning reviewing extracted data that's already 95% correct, and I have time to actually chase the late payers and talk to suppliers." Operations admin, 6 weeks after deployment

"We were going to hire another admin person. We didn't need to. The pipeline handles the volume we had and the growth we're planning for." Operations lead

"The audit trail alone justified it. When finance asked 'where did this number come from,' we could show them the original PDF, the extraction, and who approved it. That used to take an hour of folder-hunting." Team lead

Is this right for you?

This setup works well for teams that:

Process 50+ structured documents per day (invoices, POs, contracts, quotes, renewals)
Have data sovereignty or compliance requirements that prevent cloud API usage
Want to reduce data-entry overhead without replacing their entire systems stack
Already have or are considering Apple Silicon hardware (Mac Studio, Mac Pro)

It's not a fit if you:

Need real-time inference at high concurrency (>50 simultaneous requests)
Process primarily unstructured media (images, audio, video) — different stack needed
Want a fully managed cloud SaaS — Foundry is local-first by design
Have no hardware and don't want to acquire any

Want to see it running on your documents? Book a Foundry Fit Review →

Technical details (for evaluators)

Hardware: Mac Studio M3 Ultra, 512GB unified memory, 1TB SSD
Model: Qwen3-Coder-30B, Q5_K_M quantization, running via llama.cpp on port 8080
Memory footprint: ~40GB resident (of 512GB available)
Processing speed: 3-8 seconds per typical business PDF (2-5 pages)
Pipeline: Hermes watched-folder → classify → extract → validate → file → queue for review
Observability: llm_stats dashboard showing model health, memory pressure, throughput, and error rates
No-cloud posture: All processing local. No outbound API calls during document processing.
Original preservation: Source PDFs retain file hashes. Extracted data is stored separately as JSON. Working copies are clearly distinct from originals.

← Back to all case studies | Next: Conveyancing Intake →