AI Agent Architecture for Consumer Apps

Enterprise AI agents have time. They run overnight, process documents in batches, retry when they hit rate limits, and report back the next morning. Consumer AI agents have 8 seconds before the user closes the app — and a subscription invoice depending on whether they feel like the result was worth it.

The consumer constraint

Every Virtual Minds product is a consumer app. That shapes every agent decision:

Single-user state. No org hierarchies, no role-based access — just one user's data and preferences.
Latency budget under 8 seconds. Anything longer needs an explicit "we are working on it" UI affordance, and ideally a push notification when complete.
Subscription-aware. Most users see free tiers with limits. The agent has to know what tier the user is on and degrade gracefully when limits are hit.
Mobile-first. The agent may execute on the server, but the trigger and result both happen on a phone. Network reliability and battery are real constraints.

The Virtual Minds agent shape

For features under the 8-second budget, we use a single Claude tool-use loop with a curated set of internal tools (database lookups, image generation, content saving). The model orchestrates, the tools execute, the result returns synchronously.

For features that exceed the budget — like Reshot AI's headshot generation, which takes 20-40 minutes — the agent moves to an async pattern: kick off a background job, return immediately, deliver via push notification when ready. The same Claude orchestration logic runs, but on a worker queue.

Memory and personalization

Cross-session memory is where consumer agents earn their subscription. A user who paid for Room AI last week should not have to re-explain their style preferences this week. We use Anthropic's Memory tool selectively — primarily for stylistic preferences and project context — backed by Firestore for the structured data the agent will reliably need (subscription tier, project library, prior generations).

The distinction matters: Memory tool entries are for things the model should reason over. Firestore is for things the model should be told. Get this wrong and your agent either hallucinates state or wastes tokens loading it every call.

Tool design rules we follow

Tools return data, not instructions. A tool called get_user_subscription returns the subscription record; it does not return advice on what to do with it.
Tool outputs are JSON, always. Even when there is no natural reason for structure.
Idempotent or refused. Any tool that mutates state must either be safe to retry, or refuse to execute when called twice with the same parameters.
Cheap before expensive. If the agent could call a $0.0001 lookup or a $0.05 generation, it must always check the cheap option first.

The mistake we corrected

Our first agent attempt tried to be helpful by chaining many small tool calls. The latency was unacceptable on consumer mobile networks. We rewrote our tool surfaces to return larger composite objects — "give me the user, their subscription, their last 5 generations, and their style preferences in one call" — and the agent latency dropped by 60% with no quality loss.

If you are designing consumer agent tools, do not optimize for atomicity. Optimize for round-trip count.

What we are exploring next

The next frontier for our consumer agents is local execution — running a portion of agent reasoning on-device using on-device models, with cloud fallback for harder cases. Apple's on-device models and a Core ML-shaped path forward make this tractable for the iOS-side of our products. Watch this space.