Tech · AI & Data

AI that ships.

RAG, agents, evals, and the unglamorous data plumbing underneath. In production, not in a deck.

Typical engagement: 6–14 weeks

What this covers

LLM applicationsRAG pipelinesAgentsML modelsData platformsMLOpsEvals

What you get

Concrete deliverables.

01A working pilot in production by week 6 — not a notebook, not a prototype.
02Evals you trust: task-level, regression-tracked, with a human-graded golden set.
03A retrieval pipeline with monitoring, rollback, and a written runbook.
04Cost and latency budgets agreed up front and enforced in CI.

Who it's for

Signals you'll recognise.

You've built a demo. It impresses. It doesn't work in production.
You're paying for inference and you can't tell which queries cost what.
Your data team and your AI team are the same two people, and they are tired.
You've been told 'just fine-tune it' and want a second opinion.

How it works

Six weeks to first production ship.

Week 1–2

Discovery

Use cases, data audit, eval design, baseline metrics. We name what good looks like before we build.

Week 3–5

Build

First working pipeline. Baseline evals running. Demo with five real users by Friday of week 5.

Week 6–10

Harden

Production deploy, observability, cost guardrails, fallback paths, on-call runbook.

Week 10+

Embed

Your team takes the runbook. We stay on retainer if you want, or we leave.

Stack

The kit we reach for.

Models

Anthropic
OpenAI
Llama
Mistral
Bedrock

Retrieval

Pinecone
Weaviate
pgvector
LlamaIndex
LangGraph

Data

Postgres
Snowflake
dbt
Airflow
Kafka

Infra

AWS
Modal
Vercel
Fly.io
Cloudflare

Observability

Langfuse
Helicone
OpenTelemetry
Grafana
Sentry

From the same desk.

Engineering

The eval harness is the product

14 Mar · 5 min

AI Readiness

Build, buy, or wait

30 Jan · 4 min