Inference
On-device inference
Running models on the phone is now real. Battery, thermal, and memory budget are the new latency.
-
Apple Foundation Models
✓ shippedOn-device LLM shipping in iOS 26+. ~7s/post on iPhone 15 Pro, 100% schema-valid with permissive decoding.
Powers the Cuppa app today.
-
MLX-Swift
⚠ evaluatingApple Silicon ML runtime with a Swift surface. Path to running Gemma 2/3 and similar locally without Python.
Concurrent-model memory budget on iPhone 15/16 is the open question.
-
WhisperKit
◐ wishlistWhisper inference on Apple Silicon via Core ML. Voice-in for Posts and Studio narration.
-
llama.cpp Swift bindings
◐ wishlistQuantised CPU/GPU inference. Useful as a fallback path when MLX or Foundation Models aren't available.