Document Processing Pipeline
How a 30-person operations team replaced manual invoice handling with a local AI pipeline.
The problem
A growing operations team receives 50-200 business documents per day — invoices, purchase orders, renewal notices, contracts, and quotes. They arrive as PDF attachments via email, shared drives, and portal downloads.
| Step | Who does it | Time per document | Daily volume |
|---|---|---|---|
| Open and read the PDF | Admin assistant | 2-3 min | 50-200 |
| Identify document type | Admin assistant | 30 sec | 50-200 |
| Extract key fields (invoice number, amounts, dates, parties) | Data entry clerk | 3-5 min | 50-200 |
| Check for missing or inconsistent information | Senior admin | 2-3 min | 50-200 |
| File in correct folder, log in spreadsheet | Admin assistant | 1-2 min | 50-200 |
| Route for approval | Team lead | 1 min | 20-50 |
At 100 documents/day, that's 10-14 hours of human time per day spent on document triage and data extraction. Not analysis. Not decision-making. Just reading, classifying, and re-keying information that's already written down.
The cost:
- 1.5-2 FTE just for document processing
- 24-48 hour lag between receiving a document and acting on it
- Human error rate of 3-8% on data entry (wrong amounts, missed fields, misfiled documents)
- Zero audit trail — once it's in the spreadsheet, the original PDF is "somewhere in the shared drive"
What they tried first:
- OpenAI API for document extraction. Worked technically, but the company's information security policy changed: financial documents can't leave the building. API bills were also climbing past £1,800/month.
- Off-the-shelf OCR software. Read the text but didn't understand document structure. Could extract "Total: £14,400" but couldn't distinguish between an invoice total, a quote estimate, or a contract value.
- Hiring more admin staff. Possible but expensive, and the work is repetitive — not what skilled operations people should be doing.
The Foundry setup
Foundry was installed on a Mac Studio (M3 Ultra, 512GB RAM) already in the office. The machine was being used for video editing — it had the capacity but wasn't doing anything AI-related.
What was configured:
- Local model — a 30B-parameter model running via llama.cpp, optimised for document understanding. Runs entirely on-device. No document ever leaves the Mac Studio.
- Hermes document pipeline — a watched-folder workflow: documents dropped into a secure intake folder, system classifies each document, extracts structured fields, flags missing or inconsistent information, preserves the original PDF untouched alongside the extracted data, all outputs marked "requires human review" before action.
- Observability dashboard — llm_stats shows model health and memory usage, documents processed/queued/flagged, processing time per document, any errors or anomalies.
What was NOT configured: No outbound internet access for document processing. No automatic payments, approvals, or system-of-record updates. No cloud API calls — everything runs locally.
What it looks like running
Before: Document arrives at 9:07 AM
An invoice lands in the intake folder. It's a 3-page PDF from Acme Marine Ltd — an invoice for managed local inference setup and workflow integration.
At 9:07:03 AM — Foundry picks it up
The Hermes pipeline detects the new file, assigns a document ID, and queues it for processing.
At 9:07:05 AM — Classification complete
At 9:07:08 AM — Field extraction complete
| Field | Extracted value |
|---|---|
| Customer | Acme Marine Ltd |
| Customer contact | Jane Smith, jane@acme.test |
| Invoice number | INV-2026-001 |
| Purchase order | PO-77 |
| Project | Foundry Pilot (FND-001) |
| Subtotal | £12,000 |
| Tax | £2,400 |
| Total | £14,400 |
| Issue date | 2026-04-29 |
| Due date | 2026-05-29 |
| Payment terms | Net 30 |
| Line item 1 | Managed local inference setup — £9,000 |
| Line item 2 | Hermes workflow integration — £3,000 |
At 9:07:09 AM — Consistency check
The system cross-references extracted fields: Invoice total matches subtotal + tax ✅, Due date is 30 days from issue date (matches payment terms) ✅, Purchase order number present ✅, No missing required fields ✅
At 9:07:10 AM — Filed and logged
Original PDF preserved with file hash for integrity, extracted data saved as structured JSON, entry logged in the pipeline database, document status: awaiting approval.
Total processing time: 7 seconds. A human reviewer sees the extracted data and original PDF side by side, confirms accuracy, and approves. That takes 15-20 seconds — skimming, not reading from scratch.
The numbers
| Metric | Before (manual) | After (Foundry) | Change |
|---|---|---|---|
| Time per document | 8-13 min | 20-30 sec (review only) | 95% reduction |
| Documents/day capacity | 100-120 | 500+ | 5x throughput |
| Processing lag | 24-48 hours | Under 1 minute | Instant |
| Data entry errors | 3-8% | <0.5% (model reads, human confirms) | 90%+ reduction |
| FTE required | 1.5-2.0 | 0.3 (review queue only) | 1.2-1.7 FTE freed |
| Monthly API cost | £1,800 (OpenAI) | £0 (local) | £21,600/year saved |
| Audit trail | None (spreadsheet + shared drive) | Full provenance (original hash, extraction log, review approval) | Complete |
| Data leaves building? | Yes (OpenAI API) | No (local only) | Compliant |
Annual savings: £21,600 in API costs + £35,000-50,000 in freed staff time = £56,000-71,600/year.
Hardware cost: £0 (existing Mac Studio). Foundry setup: £999 + £99/month = £2,187 first year.
ROI: 25-32x in year one.
What stayed cloud
Not everything moved local. The team still uses cloud services for:
- Email delivery — documents arrive via email, processed locally after download
- Cloud storage backup — encrypted backups of processed data (not the processing itself)
- Web search and research — when the team needs to look something up, that still goes to cloud APIs
- Large model inference for complex reasoning — occasional tasks that need a frontier model still use OpenAI, but the volume dropped 90%+
The point isn't "everything local." It's "the right workloads local, with a clear line between what stays cloud and what doesn't."
What it doesn't do
- Does not make decisions. It extracts, classifies, and flags. A human approves every action.
- Does not send emails or update systems of record automatically. All outputs are drafts for human review.
- Does not handle every document type perfectly. Complex multi-page contracts with unusual structures may need manual review. The system flags these rather than guessing.
- Does not replace the operations team. It removes the data-entry grind so they can focus on exceptions, relationships, and actual operations work.
The team's experience
"Before Foundry, I spent my morning opening invoices. Now I spend my morning reviewing extracted data that's already 95% correct, and I have time to actually chase the late payers and talk to suppliers." Operations admin, 6 weeks after deployment
"We were going to hire another admin person. We didn't need to. The pipeline handles the volume we had and the growth we're planning for." Operations lead
"The audit trail alone justified it. When finance asked 'where did this number come from,' we could show them the original PDF, the extraction, and who approved it. That used to take an hour of folder-hunting." Team lead
Is this right for you?
This setup works well for teams that:
- Process 50+ structured documents per day (invoices, POs, contracts, quotes, renewals)
- Have data sovereignty or compliance requirements that prevent cloud API usage
- Want to reduce data-entry overhead without replacing their entire systems stack
- Already have or are considering Apple Silicon hardware (Mac Studio, Mac Pro)
It's not a fit if you:
- Need real-time inference at high concurrency (>50 simultaneous requests)
- Process primarily unstructured media (images, audio, video) — different stack needed
- Want a fully managed cloud SaaS — Foundry is local-first by design
- Have no hardware and don't want to acquire any
Technical details (for evaluators)
- Hardware
- Mac Studio M3 Ultra, 512GB unified memory, 1TB SSD
- Model
- Qwen3-Coder-30B, Q5_K_M quantization, running via llama.cpp on port 8080
- Memory footprint
- ~40GB resident (of 512GB available)
- Processing speed
- 3-8 seconds per typical business PDF (2-5 pages)
- Pipeline
- Hermes watched-folder → classify → extract → validate → file → queue for review
- Observability
- llm_stats dashboard showing model health, memory pressure, throughput, and error rates
- No-cloud posture
- All processing local. No outbound API calls during document processing.
- Original preservation
- Source PDFs retain file hashes. Extracted data is stored separately as JSON. Working copies are clearly distinct from originals.