Back to Logbook

Mission · Cuppa Harness

Cuppa Harness: the pattern library for the model-churn era


The model layer churns. A new state-of-the-art lands every few weeks, context windows triple, prices halve, providers come and go. If your product is built around a specific model — a specific tokenizer, a specific tool-call format, a specific cloud endpoint — you spend your engineering budget chasing the bottom of the curve instead of building the product.

The bits that don’t churn — the way you compose a request, fan out to multiple providers, repair malformed JSON, judge responses, retry with correction — those are durable. We’ve been building them for a year inside a shipping iOS app. Cuppa Harness is the extraction.

Why “harness”?

Strap a horse to a cart and the horse does the work. The harness is just leather and buckles, but without it nothing pulls. Our patterns are the harness around the model: utterly unglamorous, mostly invisible, and the reason the cart moves.

Models change. Patterns endure.

What’s in it

Six surfaces, each one a thing we got wrong before we got it right:

The protocol shape

The most important decision in the 1.0 surface is the shape of ReplyService. Get this right and every provider — Apple Foundation Models, Anthropic, Gemini, on-device Gemma, future things we haven’t heard of — slots in behind the same call site:

public protocol ReplyService {
    func reply(to request: ReplyRequest) async throws -> ReplyResponse
}

struct AppleFoundationReply: ReplyService { /* ... */ }
struct ClaudeReply: ReplyService { /* ... */ }
struct GemmaReply: ReplyService { /* ... */ }

Once every provider speaks the same protocol, fan-out is trivial:

let replies: [ReplyResponse] = try await withThrowingTaskGroup(
    of: ReplyResponse.self
) { group in
    for service in council {
        group.addTask { try await service.reply(to: request) }
    }
    var collected: [ReplyResponse] = []
    for try await reply in group { collected.append(reply) }
    return collected
}

let summary = try await moderator.summarize(replies, for: request)

That’s the whole shape of multi-model. No special-casing per provider. No “if Claude, do this; if Gemma, do that.” One protocol, N implementations, a moderator that consumes whatever comes back.

Output repair, the unglamorous half

Here’s a thing nobody warns you about: even a well-prompted model returns malformed JSON ~10% of the time. Truncated braces, stray markdown fences, “Sure, here you go!” preambles, reasoning tags leaking out of <thinking> blocks. Most “prompt engineering” advice is really parsing engineering in disguise.

Cuppa Harness ships an OutputRepair ladder that tries progressively more aggressive fixes before giving up:

let candidates = OutputRepair.candidates(from: rawText)
//   1. raw text, parsed as-is
//   2. fence-stripped (```json ... ``` removed)
//   3. brace-block-extracted (first { ... } found)
//   4. reasoning-tag stripped (<thinking>...</thinking> removed)
//   5. truncation-repaired (closing braces inferred)

for candidate in candidates {
    if let parsed = try? JSONDecoder().decode(T.self, from: candidate) {
        return parsed
    }
}

This single change took us from ~60% schema-valid output on Apple Foundation Models to 100%. Most of the wins were on the parsing side, not the prompt side.

Why non-breaking growth matters

The 1.0 surface ships streaming-shaped (ReplyChunk exists, even if no provider streams yet) and content-part-shaped (ContentPart union exists, even though everything is text today). When streaming or multimodal lands as a 1.x extension, the protocol shape doesn’t move — only the implementation does.

This is deliberate. The whole point of the harness layer is that it doesn’t churn. If we have to ship 2.0 because we forgot to leave a seam for streaming, we’ve failed the brief.

What we believe

Building on the model layer is like building on sand: the tide comes in, the tide goes out, the shape of the beach changes overnight. The harness layer is the seawall. We’d rather spend a year getting the seawall right than build a beautiful sandcastle every few weeks.

Cross-platform across iOS, Android, and React Native. Open. The fundamentals that don’t break when the model below them changes.

Read the patterns →