Sprint 1
CurrentHello, Agent
Personal AI agent deployed at {handle}.devforgehq.com/my-first-agent.
Loading workspace
AI Engineering Full-Time
Week one through completion, laid out as a locked sprint path. Finish the active sprint and its check before the next sprint opens.
Progress
0%
0 of 60 sprints complete
Sprint 1
Personal AI agent deployed at {handle}.devforgehq.com/my-first-agent.
Active chapter
12 weeks, 60 sprints, one completion path.
Week 1
Ship a deployed AI agent on day one. Then make it useful.
Sprint 1
CurrentPersonal AI agent deployed at {handle}.devforgehq.com/my-first-agent.
Sprint 2
CurrentAgent remembers the last 10 messages of the conversation.
Sprint 3
CurrentTokens appear as they're generated, not in one wall.
Sprint 4
CurrentAgent fetches live weather from a real API and answers grounded.
Sprint 5
CurrentConversations save to Neon Postgres and survive a reload.
Week 2
Multi-step reasoning: pick the right tool, then chain them.
Sprint 6
LockedA typed registry with 5 tools the agent can introspect.
Sprint 7
LockedAgent reliably picks fetch vs calc vs search based on intent.
Sprint 8
LockedAgent decomposes 'plan a 3-day trip' into 6 sequential tool calls.
Sprint 9
LockedPer-message $ cost shown in UI; logged to DB; soft cap at $0.50/day.
Sprint 10
LockedSystem prompt cached — saves 90% input cost on repeated calls.
Week 3
Make the agent retrieve relevant context instead of guessing.
Sprint 11
LockedCompute embeddings for 1000 doc chunks, store as JSON.
Sprint 12
LockedSame docs in Postgres pgvector with HNSW index — queries in <50ms.
Sprint 13
LockedQ&A bot that retrieves 5 chunks and answers from them.
Sprint 14
LockedA/B test 3 chunking strategies; pick the winner with evidence.
Sprint 15
LockedQ&A over your project's GitHub repo — answers with file:line citations.
Week 4
Naive RAG breaks at scale. Fix retrieval, citations, and failure modes.
Sprint 16
LockedCohere/Voyage reranker boosts top-3 accuracy from 64% to 86%.
Sprint 17
LockedCombine BM25 + vector — beats either alone on your eval set.
Sprint 18
LockedEvery claim in the answer links back to a specific chunk.
Sprint 19
LockedWhen retrieval returns nothing, the bot says so — does NOT confabulate.
Sprint 20
LockedAn eval suite of 50 questions with groundedness + accuracy scores.
Week 5
Beyond single calls — reasoning loops, planning, self-correction.
Sprint 21
LockedAgent that interleaves thought / action / observation until done.
Sprint 22
LockedPlanner decomposes; worker executes; loop coordinated by orchestrator.
Sprint 23
LockedAgent reviews its own output, catches its own mistakes 70% of the time.
Sprint 24
LockedAgent handles a task requiring 20+ steps without losing the thread.
Sprint 25
LockedAgent flow modeled as XState — every state, transition, and guard visible.
Week 6
Make it real: real users, real auth, real observability.
Sprint 26
LockedApp live on a custom subdomain with TLS and CDN caching.
Sprint 27
LockedEmail + Google login. Sessions across pages. Per-user state.
Sprint 28
LockedTwo users can't see each other's data — enforced at the DB layer.
Sprint 29
LockedPer-user daily + monthly $ caps. Soft warning, hard stop.
Sprint 30
LockedEvery request traceable end-to-end — Langfuse + OTLP exporter.
Week 7
If you can't measure it, you can't ship it.
Sprint 31
LockedRun 100 test cases on every prompt change — see pass-rate trend.
Sprint 32
LockedAuto-grade open-ended answers with a stronger model.
Sprint 33
LockedPrompt A vs prompt B — diff regressions before merging to main.
Sprint 34
LockedSonnet vs Haiku vs GPT-4o — pick winner per task with data.
Sprint 35
Locked20 jailbreak prompts; show your safety layer blocks 18+.
Week 8
The model is the easy part. The UX is the product.
Sprint 36
LockedCursor blink, partial-render handling, abort button.
Sprint 37
LockedAgent returns guaranteed JSON matching a Zod schema.
Sprint 38
LockedTool-use trace visible in the UI — like Linear's inline thought.
Sprint 39
LockedUser can hit ESC mid-stream; partial output saved cleanly.
Sprint 40
LockedAction buttons feel instant — server confirms in background.
Week 9
Beyond text — images, documents, audio, code, browsers.
Sprint 41
LockedUpload an image; agent extracts structured data from it.
Sprint 42
LockedPDF → searchable + queryable. Handles tables, charts, footnotes.
Sprint 43
LockedVoice-driven agent — Whisper in, ElevenLabs out.
Sprint 44
LockedAgent navigates a real web page and fills a form via screenshots.
Sprint 45
LockedAgent writes + tests + ships a small CLI tool end-to-end.
Week 10
What breaks at 10k users? Fix it before they show up.
Sprint 46
Locked90% hit rate on system-prompt cache — cost down 6×.
Sprint 47
LockedBatch API saves 50% on async eval workloads.
Sprint 48
Lockedp50 TTFT < 600ms, p95 < 1.4s, with dashboards proving it.
Sprint 49
LockedAnthropic down → OpenAI takes over in <2s. Users don't notice.
Sprint 50
LockedLightweight classifier runs on Cloudflare Workers AI — 30ms p95.
Week 11
Production AI = security AI. Adversaries are now your users.
Sprint 51
Locked20 documented attacks; your defense blocks 17+.
Sprint 52
LockedRun published jailbreak corpus — measure & report defense rate.
Sprint 53
LockedNames/emails/SSNs scrubbed pre-prompt + post-response.
Sprint 54
LockedBot floods blocked at the edge — real users never see slowdowns.
Sprint 55
LockedControls matrix + evidence pipeline ready for an auditor.
Week 12
One real project. Ship it. Defend it. Add it to your portfolio.
Sprint 56
LockedWritten spec, success metric, user research notes — reviewed by mentor.
Sprint 57
LockedLive URL anyone can use. Works end-to-end on the happy path.
Sprint 58
LockedEval suite with 50+ cases; dashboard shows current pass rate.
Sprint 59
LockedCost caps, rate limits, observability, auth — all green.
Sprint 60
Locked5-min recorded demo, decision log, mentor endorsement letter, portfolio entry.
Completion
Unlocks after Sprint 60. Includes your case study, GitHub proof, resume bullets, interview story, and final demo package.