Tech consulting
Tech · AI & Data
AI that ships.
RAG, agents, evals, and the unglamorous data plumbing underneath. In production, not in a deck.
Typical engagement: 6–14 weeks
What this covers
LLM applicationsRAG pipelinesAgentsML modelsData platformsMLOpsEvals
What you get
Concrete deliverables.
- 01A working pilot in production by week 6 — not a notebook, not a prototype.
- 02Evals you trust: task-level, regression-tracked, with a human-graded golden set.
- 03A retrieval pipeline with monitoring, rollback, and a written runbook.
- 04Cost and latency budgets agreed up front and enforced in CI.
Who it's for
Signals you'll recognise.
- You've built a demo. It impresses. It doesn't work in production.
- You're paying for inference and you can't tell which queries cost what.
- Your data team and your AI team are the same two people, and they are tired.
- You've been told 'just fine-tune it' and want a second opinion.
How it works
Six weeks to first production ship.
Week 1–2
Discovery
Use cases, data audit, eval design, baseline metrics. We name what good looks like before we build.
Week 3–5
Build
First working pipeline. Baseline evals running. Demo with five real users by Friday of week 5.
Week 6–10
Harden
Production deploy, observability, cost guardrails, fallback paths, on-call runbook.
Week 10+
Embed
Your team takes the runbook. We stay on retainer if you want, or we leave.
Stack
The kit we reach for.
Models
- Anthropic
- OpenAI
- Llama
- Mistral
- Bedrock
Retrieval
- Pinecone
- Weaviate
- pgvector
- LlamaIndex
- LangGraph
Data
- Postgres
- Snowflake
- dbt
- Airflow
- Kafka
Infra
- AWS
- Modal
- Vercel
- Fly.io
- Cloudflare
Observability
- Langfuse
- Helicone
- OpenTelemetry
- Grafana
- Sentry
Start a conversation
Tell us what
you're solving for.
One short conversation.
We'll tell you whether we're the right team — and if not, who is.