Case Studies › Code Review

Code Review Pipeline

How a software team cut PR wait times from 4 hours to 20 minutes without sending their codebase to OpenAI.

Atlas Freight Systems | 35 developers | Mac Studio M3 Ultra 512GB

The company

Atlas Freight Systems builds logistics and fleet management software. 35 developers, two engineering teams (platform and product), and a codebase that's been growing for eight years.

They ship fast — two or three deployments per day. Or they did, until code review became the bottleneck.

The problem

Every pull request needs review before it merges. The team's code review guidelines require:

What actually happens:

A developer finishes a feature at 2 PM. They open a PR. It sits in the queue.

The senior engineer is in back-to-back meetings until 4:30. She picks up the PR at 5, reviews it, leaves three comments, and goes home. The developer sees the comments at 9 AM the next day, makes changes, re-requests review. The senior engineer reviews again at 2 PM.

A one-day feature takes two days to ship because of review latency.

And then there's the quality problem. When the senior engineer finally gets to the PR at 5 PM, she's tired, she's context-switching from three meetings, and she's rushing because she knows the developer is waiting. She misses things:

The cost:

What they tried:

What Foundry does

Foundry runs on a Mac Studio in the engineering team's office. It's connected to their GitHub via a webhook — when a PR is opened or updated, Foundry gets notified.

It does a first-pass code review. Not a rubber stamp. A real review.

When a PR opens:

  1. Foundry reads the changes. It understands the diff — not just the lines changed, but the context around them, the files they're in, and how they relate to the rest of the codebase.
  2. It checks against the team's review guidelines:
    • Logic errors or edge cases the developer may have missed
    • Security concerns (input validation, auth boundaries, injection risks)
    • Test coverage — are the new code paths tested? Are edge cases covered?
    • Consistency with existing codebase patterns
    • Potential performance issues (N+1 queries, unnecessary allocations, blocking calls)
  3. It posts a structured review as a comment on the PR:
    • Must fix — issues that need to be addressed before merge
    • Should consider — suggestions that improve quality but aren't blocking
    • Looks good — areas it reviewed and found no issues
    • It includes specific line references and suggested fixes
  4. It flags the PR for human review with a priority level. A PR with no must-fix issues? Quick scan. A PR with three must-fix issues? Needs careful human review.

The senior engineer still reviews every PR. But she's reviewing a PR that's already been through a thorough first pass. She's confirming, not discovering. And she's doing it in 5 minutes instead of 30.

What it looks like day to day

2:15 PM — Developer opens a PR

Sarah pushes a feature: a new endpoint that calculates delivery route optimisation based on traffic data. The PR is 340 lines across 4 files.

2:15 PM — Foundry starts review

2:17 PM — Foundry posts review:

Review summary: 2 issues found, 1 suggestion MUST FIX — SQL injection risk (routes.py, line 47) The `traffic_source` parameter is concatenated directly into the query string. Use parameterised queries instead. # Suggested fix: cursor.execute("SELECT * FROM traffic_data WHERE source = %s", (traffic_source,)) SHOULD CONSIDER — Missing test case (test_routes.py) No test for the empty traffic data scenario. If the API returns an empty response, the optimiser will throw a KeyError on `traffic_data['routes']`. LOOKS GOOD — auth, input validation, error handling, performance Auth boundary is correct. Input validation present. Error handling follows existing pattern. No N+1 queries detected.

2:20 PM — Developer fixes the SQL injection and adds the test

Sarah sees the review immediately, fixes the issue, adds the empty-data test, and pushes the update.

2:22 PM — Foundry re-reviews the updated PR

SQL injection fixed. Parameterised query now in use. Empty data test added and passing. Ready for human review.

2:35 PM — Senior engineer reviews

David opens the PR. Foundry's review is at the top. He reads it, scans the changes, confirms the fix is correct, and approves.

Total time from PR to merge: 20 minutes.

Without Foundry, this PR would have been reviewed at 5 PM the next day — if the senior engineer had time. The SQL injection would have been caught in QA or production, not at 2:17 PM.

The numbers

MetricBeforeAfterChange
Average PR wait time3-5 hours15-25 minutes90% reduction
Senior engineer daily review time3-4 hours45-60 mins75% reduction
PR merge-to-deploy time1-2 dayssame day50% faster
Production bugs caught in review60%92%+32 points
Security issues reaching production1-2/month0-1/quarter80%+ reduction
Codebase sent to third-party AIYes (ChatGPT)NoFully local
Monthly API cost£800-1,200 (OpenAI)£0£9,600-14,400/year saved

Annual impact: 600-700 hours of senior engineer time recovered + £10,000+ in API costs + fewer production incidents (each P1 incident costs £5,000-15,000 in response, fix, and client impact).

Foundry cost: £999 setup + £99/month = £2,187 first year. Existing Mac Studio.

What stayed cloud

What moved local: the AI that reads your code and identifies issues. That's the part that was sending your proprietary codebase through OpenAI's API.

What it doesn't do

What the team says

"The first week, Foundry caught a SQL injection in a PR that I would have missed at 5 PM on a Friday. I've been reviewing code for twelve years. That stung — but it proved the point." David, CTO
"I used to wait half a day for someone to look at my code. Now it's reviewed before I've finished my coffee. The feedback is specific — line numbers, suggested fixes, not just 'looks fine.'" Sarah, developer
"The national retailer contract clause about third-party AI was the blocker for us using ChatGPT for review. Foundry runs on our hardware. Our code never leaves the building. Procurement is happy, legal is happy, and we're shipping faster." David, CTO

Is this right for your team?

This setup works for:

Not a fit if you:

Want to see it review a sample PR? Book a Foundry Fit Review →

Technical details

Hardware
Mac Studio M3 Ultra, 512GB unified memory
Model
Qwen3-Coder-30B (Q5_K_M) via llama.cpp — specifically tuned for code understanding
Pipeline
GitHub webhook → fetch diff → analyse changes → post structured review → developer addresses → re-review on update → flag for human approval
Review categories
Security (injection, auth, data exposure), logic (edge cases, error paths, null handling), tests (coverage, edge cases, assertions), consistency (codebase patterns, style, naming), performance (N+1 queries, allocations, blocking calls)
Integration
GitHub (via webhook + API), GitLab, Bitbucket, or direct git hooks
Throughput
10-30 seconds per PR depending on diff size
No-cloud posture
Code is fetched locally, reviewed locally, and the review is posted via the GitHub API. No code content is sent to any third-party AI service.
Observability
llm_stats dashboard showing review volume, issue detection rates, false positive tracking, and model health
False positive rate
~5-8% on suggestions (developer dismisses); near-zero on must-fix items (these are verified security/logic issues)

← Back to all case studies