Building 9 AI Projects (While Working Full-Time)

The reason

I have spent 20 years building through every major shift in enterprise software. Mainframe to client-server, on-prem to cloud, monolith to microservices. Each shift changed what engineering leaders needed to know. The current shift to AI is no different, except it is happening faster.

Every product is adding AI capabilities, every team needs people who understand these systems, and the engineering leaders who cannot build with AI will be managing work they do not understand. I did not want to be that leader. So I started building. Nine AI systems, each with evaluation metrics, architecture decision records, and documented tradeoffs.

The Portfolio Architecture

The 9 projects follow a deliberate progression through the applied AI stack:

Data Generation (P1): Schema-driven synthetic data with Pydantic validation and LLM-as-Judge quality scoring
Evaluation (P2): Multi-strategy RAG evaluation comparing 15 vector configurations across chunking, embeddings, and reranking
Fine-Tuning (P3): Contrastive embedding fine-tuning: standard vs LoRA, head-to-head with comprehensive metrics
Applied RAG (P4-P5): AI Resume Coach and a production RAG pipeline with hybrid search and FastAPI endpoints
Multi-Agent Systems (P6-P9): Digital clone, feedback intelligence, Jira sprint planning, and DevOps RCA. All using CrewAI orchestration.

Each project builds on the previous. P1 generates data that could train models evaluated by P2’s framework. P3’s fine-tuned embeddings feed into P5’s production pipeline. P6-P9 all use multi-agent patterns that emerge naturally once you understand the single-agent limitations from P4-P5.

What separates these from tutorials

Every tutorial shows you how to build a RAG pipeline in 20 lines. That is not what these projects are. Each one has evaluation frameworks with baselines, architecture decision records including paths I did not take, error handling for when things break, real test coverage (P2 has 557 tests, P3 has 112), and deployment considerations documented in ADRs.

The easiest way to spot a tutorial project: ask “what happens when it fails?” These projects have an answer.

This site is also a project

rubyjha.dev is built with Astro 5.0, deployed on Cloudflare Pages, with content in MDX and custom components for metrics and architecture diagrams. Eventually it will include a RAG chatbot that answers questions about my work using these blog posts and project writeups as its knowledge base.

The series

This is post #1 in a 10-part series. Each project gets its own deep-dive with architecture diagrams, real metrics, and the decisions that shaped the system, including the ones that did not work on the first try.

Next up: how I built an LLM-as-Judge that went from approving everything to catching real failures, and why calibrating the judge turned out to be harder than building the generator.

All code is open source at github.com/rubsj/ai-portfolio.

Building 9 AI Projects (While Working Full-Time)

The reason

The Portfolio Architecture

What separates these from tutorials

This site is also a project

The series

Related Posts

LoRA Hit 96% of Full Fine-Tuning. The Default Learning Rate Almost Killed It.

I Tested 16 RAG Configs So You Don't Have To: Embedding Choice Matters More Than Chunk Size

How I Calibrated an LLM Judge That Approved Everything

Why I Chose FAISS for Benchmarking and ChromaDB for Production