Mission · Cuppa Harness

Cuppa Harness: the pattern library for the model-churn era

2 May 2026

The model layer churns. A new state-of-the-art lands every few weeks, context windows triple, prices halve, providers come and go. If your product is built around a specific model — a specific tokenizer, a specific tool-call format, a specific cloud endpoint — you spend your engineering budget chasing the bottom of the curve instead of building the product.

The bits that don’t churn — the way you compose a request, fan out to multiple providers, repair malformed JSON, judge responses, retry with correction — those are durable. We’ve been building them for a year inside a shipping iOS app. Cuppa Harness is the extraction.

Why “harness”?

Strap a horse to a cart and the horse does the work. The harness is just leather and buckles, but without it nothing pulls. Our patterns are the harness around the model: utterly unglamorous, mostly invisible, and the reason the cart moves.

Models change. Patterns endure.

What’s in it

Six surfaces, each one a thing we got wrong before we got it right:

Protocols — provider-agnostic shapes (ReplyService, ModeratorService, MediaService) so swapping or fanning out across models is a one-line change.
Memory patterns — context engineering primitives, conversation shapes, retrieval scaffolding.
Orchestration — fan-out coordinators, moderator-as-judge, smart routing classifiers.
Output repair — fence stripping, brace block extraction, JSON candidate ladders, reasoning-tag handling.
Tool design — conventions for exposing app capabilities to models without the API churn.
Sandboxes — minimal reference implementations for each pattern, runnable as starting points.

The protocol shape

The most important decision in the 1.0 surface is the shape of ReplyService. Get this right and every provider — Apple Foundation Models, Anthropic, Gemini, on-device Gemma, future things we haven’t heard of — slots in behind the same call site:

public protocol ReplyService {
    func reply(to request: ReplyRequest) async throws -> ReplyResponse
}

struct AppleFoundationReply: ReplyService { /* ... */ }
struct ClaudeReply: ReplyService { /* ... */ }
struct GemmaReply: ReplyService { /* ... */ }

Once every provider speaks the same protocol, fan-out is trivial:

let replies: [ReplyResponse] = try await withThrowingTaskGroup(
    of: ReplyResponse.self
) { group in
    for service in council {
        group.addTask { try await service.reply(to: request) }
    }
    var collected: [ReplyResponse] = []
    for try await reply in group { collected.append(reply) }
    return collected
}

let summary = try await moderator.summarize(replies, for: request)

That’s the whole shape of multi-model. No special-casing per provider. No “if Claude, do this; if Gemma, do that.” One protocol, N implementations, a moderator that consumes whatever comes back.

Output repair, the unglamorous half

Here’s a thing nobody warns you about: even a well-prompted model returns malformed JSON ~10% of the time. Truncated braces, stray markdown fences, “Sure, here you go!” preambles, reasoning tags leaking out of <thinking> blocks. Most “prompt engineering” advice is really parsing engineering in disguise.

Cuppa Harness ships an OutputRepair ladder that tries progressively more aggressive fixes before giving up:

let candidates = OutputRepair.candidates(from: rawText)
//   1. raw text, parsed as-is
//   2. fence-stripped (```json ... ``` removed)
//   3. brace-block-extracted (first { ... } found)
//   4. reasoning-tag stripped (<thinking>...</thinking> removed)
//   5. truncation-repaired (closing braces inferred)

for candidate in candidates {
    if let parsed = try? JSONDecoder().decode(T.self, from: candidate) {
        return parsed
    }
}

This single change took us from ~60% schema-valid output on Apple Foundation Models to 100%. Most of the wins were on the parsing side, not the prompt side.

Why non-breaking growth matters

The 1.0 surface ships streaming-shaped (ReplyChunk exists, even if no provider streams yet) and content-part-shaped (ContentPart union exists, even though everything is text today). When streaming or multimodal lands as a 1.x extension, the protocol shape doesn’t move — only the implementation does.

This is deliberate. The whole point of the harness layer is that it doesn’t churn. If we have to ship 2.0 because we forgot to leave a seam for streaming, we’ve failed the brief.

What we believe

Building on the model layer is like building on sand: the tide comes in, the tide goes out, the shape of the beach changes overnight. The harness layer is the seawall. We’d rather spend a year getting the seawall right than build a beautiful sandcastle every few weeks.

Cross-platform across iOS, Android, and React Native. Open. The fundamentals that don’t break when the model below them changes.

Read the patterns →