AI-First Development Adoption for a Web Agency

AI-FirstCI/CDClaude CodePlaywrightTeam Adoption

I helped a web agency with a B2B SaaS platform and over 300 automotive clients transition to AI-first development. From zero CI/CD and scattered tests to a full pipeline with automated testing, coding agents integrated into the workflow, and a team that actually trusts the process.

Client: Web Agency (B2B SaaS, Automotive)

The Challenge

A web agency running a B2B SaaS platform with over 300 clients in the automotive sector had a development process that worked, but was fragile. No CI/CD pipeline, no automated tests running on PRs, no containerized environments. The codebase had grown organically over years with multiple plugins, custom integrations, and a complex architecture.

Some tests existed, but they were scattered across the codebase with no unified structure. Documentation lived in random places. Git submodules were used pervasively, making the repository harder for both developers and AI agents to navigate. These weren't just technical debt. They were active blockers to AI adoption: coding agents struggle with fragmented test suites, undiscoverable documentation, and submodule boundaries that break their ability to understand the full picture.

The team was curious about AI coding tools but hesitant. Without a safety net or a codebase that agents could work with effectively, the risk felt too high.

I was brought in part-time to work on an AI-powered voice assistant for the platform (covered in a separate case study). But as I worked on the project, I realized the biggest impact I could have wasn't just building that feature. It was changing how the entire team builds software.

Solution Approach

Building the safety net first

Before introducing any AI tooling, I needed to make the codebase safe for faster iteration. This meant the foundational work that everything else depends on.

I built a CI/CD pipeline from scratch on GitHub Actions. A single workflow runs static analysis first, then the full test suite. It runs on every push to master and every pull request. If static analysis fails, the tests don't even start. The pipeline uses a prebuilt Docker image (published to GHCR) with the entire toolchain, so there's no runtime setup and execution is fast and reproducible.

I also containerized the local development environment. The same Docker image that runs in CI can run locally. This eliminated the "works on my machine" problem and gave coding agents a consistent environment to work against.

For code quality, I introduced automated style enforcement, frontend formatting, and commit message conventions with a git hook that validates the format. These sound trivial, but they matter: consistent formatting means AI-generated diffs are clean and reviewable, and conventional commits make the git history navigable for both humans and agents.

Removing AI blockers and fixing the test infrastructure

Before agents could contribute meaningfully, the repository itself needed to become agent-friendly. I consolidated scattered test files into a centralized suite, removed pervasive git submodule usage that fragmented the codebase, and gathered documentation into discoverable locations.

The existing tests had issues of their own. Some were fragile, others were skipped, a few actively prevented the suite from running. I fixed broken tests, removed unreliable ones, and organized the suite into groups (fast vs slow, with vs without external calls) so developers could run the relevant subset quickly.

I then added Playwright for end-to-end testing. The platform is a WordPress multisite, which makes E2E testing non-trivial: you need a running instance with actual data. So I built a composable seed system. Each seed module (base data, appointments, vehicles, notifications) declares its dependencies, and a registry resolves them topologically. A CLI command creates an ephemeral test instance, seeds it with exactly the data a test scenario needs, runs the test, and tears everything down. This makes E2E tests deterministic and isolated.

Introducing AI coding agents

With the safety net in place, I introduced Claude Code into the workflow. I wrote 17 agent skills that document the codebase architecture, plugin structure, testing conventions, and CI pipeline. These skills give the agent enough context to make meaningful contributions without needing a human to explain the project from scratch each time.

I configured an automated code review workflow on GitHub: when a PR gets a specific label, Claude Code reviews the diff with full project context. I also set up CodeRabbit for surface-level review on all PRs.

Knowledge capture as infrastructure

Coding agents can read code efficiently, but they can't read minds. The things that cause the most problems are the unwritten rules: business logic that lives in someone's head, compliance constraints, formatting preferences, domain context that explains why the code is shaped a certain way. If you want agents to be precise and avoid hallucinations, you have to write all of that down.

I built a structured knowledge base using agent skills that has become an industry standard format. Seventeen skills covering the full codebase: architecture, plugin structure, testing conventions, CI pipeline, booking system logic, domain-specific rules. Each skill gives agents grounded context before they touch any code.

I also created a "self-learning" skill that prompts the agent to capture new knowledge after every significant interaction, updating the skill files so the next session starts with better context. This creates a flywheel: better documentation makes agents more effective, which produces more code, which uncovers more knowledge worth documenting.

The trust effect was immediate. Developers who were initially skeptical became noticeably more comfortable when they could see that the agent had read a grounding document before generating code. Knowing that the agent understood the project's conventions and constraints made reviewing its output feel less like a gamble.

Changing the mindset, not just the code

The technical changes would have been useless without a cultural shift. Code blockers are easy to identify and fix. Mentality blockers are harder.

I ran team training sessions on AI-first development: how coding agents work, what they're good at, where they fail, and how the new infrastructure protects against mistakes. But the real work happened one-on-one. I sat with each developer individually to understand their specific doubts and resistance.

Each conversation was different, and that's the point. You can't fix a mentality block with a group presentation. You fix it by understanding what's holding each person back and addressing that directly. For the skeptic, I showed how the CI pipeline catches regressions before they reach production. For the enthusiast, I helped them set up their first productive coding agent session. For the worried, I reframed AI tools as something that makes their expertise more valuable, not less: the agent writes the boilerplate, but the developer's judgment on architecture, edge cases, and business logic is what makes the output correct.

Results & Impact

CI pipeline running on every PR: static analysis followed by the full test suite, containerized and reproducible
E2E testing infrastructure with Playwright: composable seed system for deterministic, isolated test scenarios on a complex multisite platform
17 agent skills providing comprehensive context for coding agents across the entire codebase
Automated code review on pull requests with Claude Code and CodeRabbit
Containerized development environments: same Docker image for CI and local development
Code quality gates: automated linting, formatting, and commit conventions enforced via git hooks
Team trained on AI-first development with both group sessions and individual mentoring to address specific concerns and resistance

Technologies Used

GitHub Actions (CI/CD pipeline)
Docker / GHCR (containerized CI image)
PHPStan (static analysis) + PHPUnit (unit & integration tests)
Playwright (end-to-end tests with composable seed system)
Claude Code (AI coding agent + automated PR review)
CodeRabbit (automated code review)
17 agent skills (structured knowledge base)
Team training: group sessions + individual mentoring

Discuss Your Project