The idea started at Wiz SKO in a Leaders meeting with Dali Rajic (CRO); he was asking who had mentors and talked about the concept of a coaching bot that had the intelligence of the worlds great leaders. I liked the concept and put building something around it on the to-do list. But not just the chat element; I wanted to combine another element I'd been thinking about: high-end Claude Code development workflows. I decided to add agentic business research and document creation after bot MVP and I'll get to that later... The result is AI Coach.

What it does

AI Coach is a multi-tenant GTM 'playbook' and enablement platform. Each tenant gets an isolated environment with their own Bedrock Knowledge Base, agent configurations, and product catalogue. The core features are:

A 10-agent research pipeline that takes a prospect URL and produces four branded sales documents: an account plan, a value pyramid, a 3Y document, and a "Crazy 4" ideation piece. Six markdown research artefacts are generated along the way. The whole pipeline runs in around 15 to 20 minutes via Step Functions. Contrast that to what I see in the real-world and you're saving many hours (although I tell the teams not to outsource their intelligence).

A HITL editor where strategy-agent outputs land as proposed sidecars. Tenant admins review each section in a side-by-side diff editor (CodeMirror 6), accept, reject, or regenerate, and the canonical file flips when reviewed.

Coaching surfaces: one-to-one streaming chat with a chosen persona, multi-persona roundtable debates, and a Telegram bot with full agent capabilities and cross-channel identity.

An admin panel with super-admin CRUD for tenants, agents, prompts, templates, personas, guardrails, and usage telemetry. Agents have a Git-backed PR flow for structural changes and an on-demand flow for prompt-only edits.

The architecture

Everything runs on AWS in eu-west-1. The stack is entirely serverless.

React + Zustand Amplify Hosting API Gateway Cognito JWT auth Cognito Email/password + pre-signup gate Lambda functions (Python 3.13 + Node.js 22) Chat Pipeline worker Doc generator Admin KB provisioner Step Functions Research + provisioning DAGs Bedrock Claude Sonnet 4.6 + prompt caching DynamoDB 11 tables Bedrock Knowledge Base Per-tenant RAG S3 Vectors + prompts + documents Terraform IaC + GitHub Actions CI/CD Telegram bot Cross-channel identity

The front-end is React with Zustand for state management, deployed via Amplify Hosting. Authentication uses Cognito with email/password and a pre-signup Lambda gate to control access. API Gateway handles all HTTP routes with Cognito JWT validation. Lambda functions are split by concern: chat (streaming (NodeJS) via Function URLs), pipeline worker, doc generator, admin CRUD, tenant bootstrap, and KB provisioner. Step Functions orchestrate the research pipeline (Python) and tenant provisioning DAGs. Bedrock runs Claude Sonnet 4.6 via EU inference profiles with prompt caching via the Converse API. DynamoDB backs eleven tables. Terraform manages all infrastructure with state in S3 and a DynamoDB lock table. GitHub Actions runs terraform plan on PRs and apply on merge to main.

How it evolved

The project went through five major versions in around four weeks of focused effort. Understanding the evolution matters because each version taught a different lesson.

v1 was the MVP: persona chat with streaming responses, Cognito auth, basic conversation history. It worked, but a standalone chat persona had limited value. The reflection after v1 was that the product needed a purpose beyond "talk to fake McMahon". During this phase I also learnt that you can only stream text with NodeJS Lambdas.

v2 added the GTM layer: the 10-agent research pipeline, project management, document generation. This gave the personas something to do; they could advise on real prospect research, not just have open-ended conversations. The UX was overhauled in v2.5 based on usability testing. This reused a Python-based research flow I'd previously built (streaming not required here).

v3 introduced roundtable debates (multi-persona discussions on a topic) and the Telegram bot for mobile access. Architecture Decision Records started appearing here as the system got complex enough to need them.

v4 was the big one: full multi-tenancy. Each tenant gets isolated Bedrock Knowledge Bases, their own agent configurations, product catalogue, and pillar vocabulary. Tenant-admin onboarding at /setup drafts a TenantProfile and ProductCatalogue from the company's public website. The HITL editor landed here too. In hindsight, I should have committed to multi-tenancy earlier rather than treating it as a later addition; retrofitting isolation across eleven DynamoDB tables and the pipeline was harder than building it in from the start; this was a debate I had with Claude post MVP but multi-tenancy wasn't a firm direction at that stage.

The most-complex element was to create the Tenant prompt model: the primary tenant uses hand-tuned prompts stored in S3, while platform tenants use baseline prompts rendered against their TenantProfile via Nunjucks; essentially taking what I'd bespoke designed for Wiz and making it work for any 'playbook' GTM motion. This was validated and accepted as long-term architecture in ADR-0005.

v5 was focussed on observability and additional admin capabilities like tenant cost control. At this stage it's pretty much complete as a true MVP SaaS platform with the exception of billing.

v6+ there are still some small enhancements and fixes I'll apply over the coming months but for now the project is largely complete.

Design before code

The most important decision I made was spending time on architecture before writing a line of code. Before the MVP I wrote a full PRD, debated the architecture with Claude, and explicitly tried to avoid tech debt from the start by thinking about the end state; it actually took up to v1.7 of the PRD for me to be happy. That didn't prevent all rework (the multi-tenancy retrofit proved that), but it prevented a lot of it.

One element I'm especially pleased with is the lack of Security vulnerabilities or code issues Wiz found when scanning the Infra/Code; I think largely driven by the upfront planning and guardrails.

The PRD evolved through another seven versions as the product grew. Build guides were written for each major version and frozen on ship. Status snapshots captured the state of the system at each milestone. Tech debt was tracked in a dedicated audit log (104KB by the end). Every architectural decision of consequence got an ADR.

This documentation wasn't overhead; it was the mechanism that kept Claude Code on track across sessions. The 24KB CLAUDE.md file is the single most important document in the repo. It contains the architecture, conventions, API contracts, and naming patterns that Claude needs to maintain consistency. Without it, every new Claude Code session would start from zero.

Working with Claude Code at scale

This project was built almost entirely with Claude Code using agentic workflows. A few things I learned about making that work at this scale:

Process keeps Claude on track. Agentic usage is powerful but fails without process. The PRD, build guides, and CLAUDE.md together formed the guardrails that prevented drift; there were clear commit/PR gates where Claude had to call tools or sub-agents before moving forwards, e.g. Linters, Code Reviews, Doc Writer and Test Generators to name a subset. After the MVP, most work used autonomous workflows: I'd describe the goal, Claude would plan and execute, I'd review.

Parallelise where possible. Getting Claude to identify what could be built in parallel made a noticeable difference to velocity. Infrastructure and backend work often ran simultaneously.

Version everything early. Pinning Node.js 22, Python 3.13, and dependency versions from the start avoided the class of bugs that come from outdated or mismatched software. This seems obvious but it's easy to defer.

Push back with your own instincts. I was leaning toward multi-tenant architecture post-MVP but took the easier single-tenant route first. Claude drove this at the time (we did debate). With hindsight, the multi-tenant retrofit was harder than building it in from the start would have been. Sometimes the harder path early is the easier path overall.

Use /insights. Claude Code's insights helped identify process improvements I wouldn't have spotted myself. although I only came across this near the end of the project. Some recommendations are off but a few handy ones!

The numbers and reflection

Four weeks of focused development. 14 PRD versions. 11 build guides. Five ADRs. 11 DynamoDB tables. 44 build phases. 490 commits. A 104KB tech debt log. A 24KB CLAUDE.md. Seven personas (Musk, Bezos, Huang, Jobs, McMahon, Amodei, plus the GTM agent). Ten pipeline agents producing four documents per prospect.

It's the largest application I've built and a great learning/demonstration of what's possible when Claude Code is given proper process and documentation to work with.

There are definitely quicker and simpler ways to achieve similar outcome I did with this; one example is combining Gemini (with persona instructions) and NotebookLM; another simply using Claude Projects and Skills. But that was never the point of this project. And I feel I accomplished the learnings that I set out to achieve.

I've left the repo private on this one. Not because there is anything proprietary to Wiz in there, but because it does model how GTM runs across Playbook companies, and feels better to keep that internal. If you read this and want the insights, or to see the repo, then message me and we can discuss.

AI Coach is live at playbook.jamescarty.co.uk.