I wanted to understand agent frameworks properly, not just pick one and build with it. So I built the same agent three times: once with LangGraph calling Claude directly via the Anthropic API, once with Strands Agents on AWS Bedrock, and once with Google's ADK on Vertex AI Agent Engine. Same domain, same pipeline, same output; three completely different architectures.
I picked oil commodities as the domain because it's a side interest of mine and the public data is abundant. The goal was a daily briefing agent that researches current oil market conditions, pulls live price data, synthesises a report, audits its own quality, and produces a final brief. The real learning was in what transferred between platforms and what didn't.
What the agent does
The conceptual pipeline is the same across all three implementations: specialist agents handle research (geopolitics, supply/demand, macro trends, technical analysis), a price tool fetches live data via yfinance, a synthesiser combines everything into a draft brief, auditors run quality checks with bounded retry loops, and a final brief is produced.
The key design constraints were consistent across all three: bounded audit loops (no infinite retries), pass-bias prompts for auditors (to prevent over-flagging), and a two-stage analyse-then-render pattern that separates analytical synthesis from final formatting.
Phase 1: LangGraph + Anthropic API
The LangGraph implementation uses a typed state graph with 12 nodes. State-first design was the guiding principle: define the state schema before writing any nodes. Each node stays small because state absorbs inter-node communication. Eleven of twelve nodes use simple LLM calls with with_structured_output against TypedDict schemas; only one node needed a full agent loop.
The main takeaway from Phase 1 was that schema-as-contract eliminates an entire class of parsing bugs. When the provider enforces the output schema, you don't write JSON parsing code. The graph structure also makes the pipeline explicit and testable; 35 tests run against real APIs by design, no mocking.
Phase 2: Strands + Bedrock
The Strands implementation uses an agents-as-tools pattern: one orchestrator with eight specialist agents wired as tools. The critical difference from LangGraph is that workflow logic lives in the orchestrator's prompt rather than in code. The orchestrator receives a declarative goal and constraints, and the model decides which specialists to call and in what order.
This was the phase where text-native data flow proved itself. Specialists return prose with section-header conventions rather than structured JSON. The orchestrator parses headers and verdict lines from text, which turned out to be simpler and more robust than enforcing schemas at every boundary.
Retry caps were enforced by prose alone: the model counts its own tool-call cycles from conversation history and stops at the named limit, with explicit narration. No programmatic safety net was needed, which was surprising.
Phase 3: ADK + Vertex AI Agent Engine
The ADK implementation uses workflow agents: ParallelAgent for concurrent research, two LoopAgents for audit retry loops, and a custom BaseAgent orchestrator to coordinate them in sequence. This was the only phase that resulted in a deployed agent; it runs on Vertex AI Agent Engine with Gemini 2.5 Flash.
The deployment had its quirks. The src/ layout required cloudpickle.register_pickle_by_value to get the agent package into the remote container. Run-to-run latency varied wildly (129s to 365s for the same code locally), making single-run benchmarks unreliable. And auditors ran stricter on real pipeline input than on isolated smoke tests, even with the same pass-bias prompts.
What's portable, what's not
The biggest finding across all three phases: prompt-level discipline is the portable layer; framework scaffolding is not.
Six of seven specialist prompts ported verbatim from Phase 2 (Claude Haiku on Bedrock) to Phase 3 (Gemini Flash on Vertex). The auditor calibration, anti-weasel framing, embedded-metric prose, and targeted revision patterns all transferred directly. These are about how the model reasons, not about which framework orchestrates the calls.
What didn't port: state management, error handling, deployment patterns, and the mechanics of routing between agents. LangGraph uses typed state graphs. Strands uses model-driven tool selection from the orchestrator's prompt. ADK uses workflow agents with explicit agent types. Each approach has trade-offs, but the prompts that drive the actual intelligence are the same across all three.
This has practical implications for anyone choosing an agent framework: invest in prompt engineering and agent design before picking a framework. The framework is the plumbing. The prompts are the product.
How it was built
This project was built differently to AI Coach. Rather than Claude Code CLI, this was a collaboration with Claude Desktop: I directed the architecture and design decisions, we pair-programmed the python (although Claude did more), and I reviewed and tested. The approach was design-first, code-second; settling the agent topology and prompt strategy before writing implementation code.
Each phase produced its own tutorial notes (step-by-step build documentation), a retrospective, and observations. The documentation habit from AI Coach carried forward and paid off again: three phases of learning are hard to retain without written retrospectives.
The source code for all three phases is on GitHub.
Comments 0
Log in or register to comment.