· Ruby Jha · project-deep-dives · 3 min read
Building 9 AI Projects (While Working Full-Time)
Why I am building 9 AI systems from scratch while working full-time as an Engineering Manager. The portfolio, the progression, and what I have learned so far.
The reason
I have spent 20 years building through every major shift in enterprise software. Mainframe to client-server, on-prem to cloud, monolith to microservices. Each shift changed what engineering leaders needed to know. The current shift to AI is no different, except it is happening faster.
Every product is adding AI capabilities, every team needs people who understand these systems, and the engineering leaders who cannot build with AI will be managing work they do not understand. I did not want to be that leader. So I started building. Nine AI systems, each with evaluation metrics, architecture decision records, and documented tradeoffs.
The Portfolio Architecture
The 9 projects follow a deliberate progression through the applied AI stack:
- Data Generation (P1): Schema-driven synthetic data with Pydantic validation and LLM-as-Judge quality scoring
- Evaluation (P2): Multi-strategy RAG evaluation comparing 15 vector configurations across chunking, embeddings, and reranking
- Fine-Tuning (P3): Contrastive embedding fine-tuning: standard vs LoRA, head-to-head with comprehensive metrics
- Applied RAG (P4-P5): AI Resume Coach and a production RAG pipeline with hybrid search and FastAPI endpoints
- Multi-Agent Systems (P6-P9): Digital clone, feedback intelligence, Jira sprint planning, and DevOps RCA. All using CrewAI orchestration.
Each project builds on the previous. P1 generates data that could train models evaluated by P2’s framework. P3’s fine-tuned embeddings feed into P5’s production pipeline. P6-P9 all use multi-agent patterns that emerge naturally once you understand the single-agent limitations from P4-P5.
What separates these from tutorials
Every tutorial shows you how to build a RAG pipeline in 20 lines. That is not what these projects are. Each one has evaluation frameworks with baselines, architecture decision records including paths I did not take, error handling for when things break, real test coverage (P2 has 557 tests, P3 has 112), and deployment considerations documented in ADRs.
The easiest way to spot a tutorial project: ask “what happens when it fails?” These projects have an answer.
This site is also a project
rubyjha.dev is built with Astro 5.0, deployed on Cloudflare Pages, with content in MDX and custom components for metrics and architecture diagrams. Eventually it will include a RAG chatbot that answers questions about my work using these blog posts and project writeups as its knowledge base.
The series
This is post #1 in a 10-part series. Each project gets its own deep-dive with architecture diagrams, real metrics, and the decisions that shaped the system, including the ones that did not work on the first try.
Next up: how I built an LLM-as-Judge that went from approving everything to catching real failures, and why calibrating the judge turned out to be harder than building the generator.
All code is open source at github.com/rubsj/ai-portfolio.