Mission · Cuppa Harness
Cuppa Harness: the pattern library for the model-churn era
The model layer churns. A new state-of-the-art lands every few weeks, context windows triple, prices halve, providers come and go. If your product is built around a specific model — a specific tokenizer, a specific tool-call format, a specific cloud endpoint — you spend your engineering budget chasing the bottom of the curve instead of building the product.
The bits that don’t churn — the way you compose a request, fan out to multiple providers, repair malformed JSON, judge responses, retry with correction — those are durable. We’ve been building them for a year inside a shipping iOS app. Cuppa Harness is the extraction.
Why “harness”?
Strap a horse to a cart and the horse does the work. The harness is just leather and buckles, but without it nothing pulls. Our patterns are the harness around the model: utterly unglamorous, mostly invisible, and the reason the cart moves.
Models change. Patterns endure.
What’s in it
Six surfaces, each one a thing we got wrong before we got it right:
- Protocols — provider-agnostic shapes (
ReplyService,ModeratorService,MediaService) so swapping or fanning out across models is a one-line change. - Memory patterns — context engineering primitives, conversation shapes, retrieval scaffolding.
- Orchestration — fan-out coordinators, moderator-as-judge, smart routing classifiers.
- Output repair — fence stripping, brace block extraction, JSON candidate ladders, reasoning-tag handling.
- Tool design — conventions for exposing app capabilities to models without the API churn.
- Sandboxes — minimal reference implementations for each pattern, runnable as starting points.
The protocol shape
The most important decision in the 1.0 surface is the shape of
ReplyService. Get this right and every provider — Apple Foundation
Models, Anthropic, Gemini, on-device Gemma, future things we
haven’t heard of — slots in behind the same call site:
public protocol ReplyService {
func reply(to request: ReplyRequest) async throws -> ReplyResponse
}
struct AppleFoundationReply: ReplyService { /* ... */ }
struct ClaudeReply: ReplyService { /* ... */ }
struct GemmaReply: ReplyService { /* ... */ }
Once every provider speaks the same protocol, fan-out is trivial:
let replies: [ReplyResponse] = try await withThrowingTaskGroup(
of: ReplyResponse.self
) { group in
for service in council {
group.addTask { try await service.reply(to: request) }
}
var collected: [ReplyResponse] = []
for try await reply in group { collected.append(reply) }
return collected
}
let summary = try await moderator.summarize(replies, for: request)
That’s the whole shape of multi-model. No special-casing per provider. No “if Claude, do this; if Gemma, do that.” One protocol, N implementations, a moderator that consumes whatever comes back.
Output repair, the unglamorous half
Here’s a thing nobody warns you about: even a well-prompted model
returns malformed JSON ~10% of the time. Truncated braces, stray
markdown fences, “Sure, here you go!” preambles, reasoning tags
leaking out of <thinking> blocks. Most “prompt engineering”
advice is really parsing engineering in disguise.
Cuppa Harness ships an OutputRepair ladder that tries progressively
more aggressive fixes before giving up:
let candidates = OutputRepair.candidates(from: rawText)
// 1. raw text, parsed as-is
// 2. fence-stripped (```json ... ``` removed)
// 3. brace-block-extracted (first { ... } found)
// 4. reasoning-tag stripped (<thinking>...</thinking> removed)
// 5. truncation-repaired (closing braces inferred)
for candidate in candidates {
if let parsed = try? JSONDecoder().decode(T.self, from: candidate) {
return parsed
}
}
This single change took us from ~60% schema-valid output on Apple Foundation Models to 100%. Most of the wins were on the parsing side, not the prompt side.
Why non-breaking growth matters
The 1.0 surface ships streaming-shaped (ReplyChunk exists, even
if no provider streams yet) and content-part-shaped (ContentPart
union exists, even though everything is text today). When streaming
or multimodal lands as a 1.x extension, the protocol shape doesn’t
move — only the implementation does.
This is deliberate. The whole point of the harness layer is that it doesn’t churn. If we have to ship 2.0 because we forgot to leave a seam for streaming, we’ve failed the brief.
What we believe
Building on the model layer is like building on sand: the tide comes in, the tide goes out, the shape of the beach changes overnight. The harness layer is the seawall. We’d rather spend a year getting the seawall right than build a beautiful sandcastle every few weeks.
Cross-platform across iOS, Android, and React Native. Open. The fundamentals that don’t break when the model below them changes.