Building on Claude: Lessons from 1M+ Users Across Seven Apps

Claude is the reasoning engine behind most of our AI features — from generating product copy to analyzing trading charts to surfacing red flags in private conversations. Production deployment across seven products taught us things that no API documentation captures.
1. Schema-first prompting beats freeform every time
When we started, we wrote prompts that asked Claude for a useful answer and then parsed the result. Today every production prompt declares a JSON schema in the system message and asks Claude to populate it. Failure rates dropped from roughly 4 percent to under 0.3 percent.
For high-stakes flows (subscription billing logic, content moderation, anything customer-visible) we layer Pydantic-style runtime validation on top. Claude is excellent at structured output but should never be the only validator on the path.
2. Prompt caching is a unit-economics weapon
Our typical product prompt has a long system message — brand voice, output schema, safety rules, few-shot examples — and a short user-specific query. With Claude's prompt caching, the system message is computed once and the per-call cost on cached input drops by 90 percent.
Across Reshot AI alone this changed the per-user inference cost by enough that the difference paid for an entire infrastructure engineer's salary over a year. If you are running anything more than a handful of calls per day on Claude, you should be using prompt caching aggressively.
3. Use the cheap model first, escalate when needed
We default every request to Claude Haiku (cheap, fast, often sufficient) and escalate to Sonnet or Opus only when Haiku's output fails validation or scores poorly on a small evaluator. This pattern handles ~85% of our traffic at Haiku prices.
The cascading approach also forces us to write better evaluators — without them, we would have no way to detect when escalation is needed.
4. Treat tool use as the architectural primitive
Anthropic's tool use API is the right abstraction for almost any non-trivial AI feature. Instead of trying to engineer a single mega-prompt that does everything, we expose product capabilities as tools (lookup_user, generate_image, save_to_library, send_notification) and let Claude orchestrate them.
This gives us three wins: better reliability, observability into what the model is actually doing, and the ability to swap an underlying capability without touching the model side.
5. Long context windows change what you build
With Claude's 1M context window now available, entire product categories that did not make sense in 2023 are suddenly tractable. We are exploring features that load a user's entire conversation history, listing portfolio, or design library into a single context call — workflows that would have required complex retrieval pipelines a year ago.
This is not just an efficiency improvement. It changes the design space.
What we got wrong (and corrected)
We initially built our own evaluation pipeline before realizing Anthropic's evals API and tooling were significantly better than what we could build with a small team. Lesson: where there is platform-provided tooling, use it. Your competitive edge is the product, not the eval harness.
What is next
Claude continues to be the default model for new features. Our investment for the next quarter is two-fold: tighter integration with the Anthropic Memory tool for cross-session product personalization, and migrating our agent orchestration logic to the Anthropic Agent SDK for the parts where we previously rolled our own.


