Code Review Pipeline
How a software team cut PR wait times from 4 hours to 20 minutes without sending their codebase to OpenAI.
The company
Atlas Freight Systems builds logistics and fleet management software. 35 developers, two engineering teams (platform and product), and a codebase that's been growing for eight years.
They ship fast — two or three deployments per day. Or they did, until code review became the bottleneck.
The problem
Every pull request needs review before it merges. The team's code review guidelines require:
- At least one senior engineer to review logic, architecture, and edge cases
- Security checks (input validation, auth boundaries, data exposure)
- Test coverage verification
- Style and consistency checks
What actually happens:
A developer finishes a feature at 2 PM. They open a PR. It sits in the queue.
The senior engineer is in back-to-back meetings until 4:30. She picks up the PR at 5, reviews it, leaves three comments, and goes home. The developer sees the comments at 9 AM the next day, makes changes, re-requests review. The senior engineer reviews again at 2 PM.
A one-day feature takes two days to ship because of review latency.
And then there's the quality problem. When the senior engineer finally gets to the PR at 5 PM, she's tired, she's context-switching from three meetings, and she's rushing because she knows the developer is waiting. She misses things:
- A SQL injection vector in a new query (caught in production, three weeks later)
- A missing test case for the empty-input edge case (customer found it)
- An inconsistent error handling pattern that diverged from the codebase standard (technical debt accumulating silently)
The cost:
- Average PR wait time: 3-5 hours
- Average merge-to-deploy time: 1-2 days (review is the bottleneck)
- Senior engineer spends 3-4 hours/day on review — that's half her working day
- Production bugs that a thorough review would have caught: 2-3 per month
- Developer morale: engineers are frustrated by the wait and the rushed reviews
What they tried:
- GitHub Copilot. Good for code completion, but it doesn't review PRs. It suggests code as you type — different problem.
- ChatGPT for code review. It worked — but the company's CTO, David, realised they were sending their entire codebase to OpenAI's API. The logistics algorithms, the fleet routing logic, the customer integration code — all of it going through a third party. Their biggest client, a national retailer, has a clause in their contract: "Supplier codebases containing [client] integration logic must not be processed by third-party AI services."
- Hiring a dedicated reviewer. They hired one. He quit after four months — reviewing other people's code all day is not a fulfilling job.
What Foundry does
Foundry runs on a Mac Studio in the engineering team's office. It's connected to their GitHub via a webhook — when a PR is opened or updated, Foundry gets notified.
It does a first-pass code review. Not a rubber stamp. A real review.
When a PR opens:
- Foundry reads the changes. It understands the diff — not just the lines changed, but the context around them, the files they're in, and how they relate to the rest of the codebase.
- It checks against the team's review guidelines:
- Logic errors or edge cases the developer may have missed
- Security concerns (input validation, auth boundaries, injection risks)
- Test coverage — are the new code paths tested? Are edge cases covered?
- Consistency with existing codebase patterns
- Potential performance issues (N+1 queries, unnecessary allocations, blocking calls)
- It posts a structured review as a comment on the PR:
- Must fix — issues that need to be addressed before merge
- Should consider — suggestions that improve quality but aren't blocking
- Looks good — areas it reviewed and found no issues
- It includes specific line references and suggested fixes
- It flags the PR for human review with a priority level. A PR with no must-fix issues? Quick scan. A PR with three must-fix issues? Needs careful human review.
The senior engineer still reviews every PR. But she's reviewing a PR that's already been through a thorough first pass. She's confirming, not discovering. And she's doing it in 5 minutes instead of 30.
What it looks like day to day
2:15 PM — Developer opens a PR
Sarah pushes a feature: a new endpoint that calculates delivery route optimisation based on traffic data. The PR is 340 lines across 4 files.
2:15 PM — Foundry starts review
2:17 PM — Foundry posts review:
2:20 PM — Developer fixes the SQL injection and adds the test
Sarah sees the review immediately, fixes the issue, adds the empty-data test, and pushes the update.
2:22 PM — Foundry re-reviews the updated PR
2:35 PM — Senior engineer reviews
David opens the PR. Foundry's review is at the top. He reads it, scans the changes, confirms the fix is correct, and approves.
Total time from PR to merge: 20 minutes.
Without Foundry, this PR would have been reviewed at 5 PM the next day — if the senior engineer had time. The SQL injection would have been caught in QA or production, not at 2:17 PM.
The numbers
| Metric | Before | After | Change |
|---|---|---|---|
| Average PR wait time | 3-5 hours | 15-25 minutes | 90% reduction |
| Senior engineer daily review time | 3-4 hours | 45-60 mins | 75% reduction |
| PR merge-to-deploy time | 1-2 days | same day | 50% faster |
| Production bugs caught in review | 60% | 92% | +32 points |
| Security issues reaching production | 1-2/month | 0-1/quarter | 80%+ reduction |
| Codebase sent to third-party AI | Yes (ChatGPT) | No | Fully local |
| Monthly API cost | £800-1,200 (OpenAI) | £0 | £9,600-14,400/year saved |
Annual impact: 600-700 hours of senior engineer time recovered + £10,000+ in API costs + fewer production incidents (each P1 incident costs £5,000-15,000 in response, fix, and client impact).
Foundry cost: £999 setup + £99/month = £2,187 first year. Existing Mac Studio.
What stayed cloud
- GitHub, CI/CD pipeline, deployment infrastructure — all untouched
- Cloud development environments (if used) — Foundry reviews the PR, not the dev environment
- External API calls in the code being reviewed — Foundry reads code, it doesn't execute it
- Developer tools (IDEs, Copilot for code completion) — Foundry does review, not completion
What moved local: the AI that reads your code and identifies issues. That's the part that was sending your proprietary codebase through OpenAI's API.
What it doesn't do
- Does not auto-approve or auto-merge PRs. Every PR still needs a human to say "approved."
- Does not write code. It reviews and suggests fixes, but the developer writes the fix.
- Does not replace the senior engineer. It does the first pass — the systematic, tedious checking that takes time but doesn't require deep architectural judgement. The senior engineer still reviews for architecture, business logic, and things AI can't see.
- Does not send code externally. The review runs on the Mac Studio in your office. Your codebase stays yours.
- Does not catch everything. It's very good at pattern-based issues (security, tests, consistency) and less good at "is this the right architectural approach for our business." That's still the senior engineer's job.
What the team says
"The first week, Foundry caught a SQL injection in a PR that I would have missed at 5 PM on a Friday. I've been reviewing code for twelve years. That stung — but it proved the point." David, CTO
"I used to wait half a day for someone to look at my code. Now it's reviewed before I've finished my coffee. The feedback is specific — line numbers, suggested fixes, not just 'looks fine.'" Sarah, developer
"The national retailer contract clause about third-party AI was the blocker for us using ChatGPT for review. Foundry runs on our hardware. Our code never leaves the building. Procurement is happy, legal is happy, and we're shipping faster." David, CTO
Is this right for your team?
This setup works for:
- Software teams of 10-100 developers doing regular PRs
- Companies with proprietary codebases that can't go through third-party AI APIs
- Teams where senior engineer review time is the deployment bottleneck
- Organisations with security requirements (financial services, healthcare, defence-adjacent, regulated industries)
Not a fit if you:
- Have very low PR volume (<5/week) — human review is fine
- Already have a CI-based code quality pipeline you're happy with
- Are comfortable sending your codebase through cloud AI tools
- Don't have a senior engineer to do the final human review (Foundry augments, it doesn't replace)
Technical details
- Hardware
- Mac Studio M3 Ultra, 512GB unified memory
- Model
- Qwen3-Coder-30B (Q5_K_M) via llama.cpp — specifically tuned for code understanding
- Pipeline
- GitHub webhook → fetch diff → analyse changes → post structured review → developer addresses → re-review on update → flag for human approval
- Review categories
- Security (injection, auth, data exposure), logic (edge cases, error paths, null handling), tests (coverage, edge cases, assertions), consistency (codebase patterns, style, naming), performance (N+1 queries, allocations, blocking calls)
- Integration
- GitHub (via webhook + API), GitLab, Bitbucket, or direct git hooks
- Throughput
- 10-30 seconds per PR depending on diff size
- No-cloud posture
- Code is fetched locally, reviewed locally, and the review is posted via the GitHub API. No code content is sent to any third-party AI service.
- Observability
- llm_stats dashboard showing review volume, issue detection rates, false positive tracking, and model health
- False positive rate
- ~5-8% on suggestions (developer dismisses); near-zero on must-fix items (these are verified security/logic issues)