The on-device LLM field in 2026 has matured. It is no longer experimental, but it is not undifferentiated either. Here is the current landscape.
The top tier for mobile
Gemma 2 (2B and 9B), Phi-4 (14B, for larger phones), Llama 3.2 (1B and 3B), Qwen 2.5 (1.5B-7B). Each has a niche; Gemma is particularly strong on instruction-following for assistant workloads.
What they handle well
Summarization. Classification. Entity extraction. Sentiment and mood tagging. Short-form rewriting. Conversation with explicit system prompts.
What they still struggle with
Long-context reasoning (>8k tokens is painful). Multi-step agentic workflows. Code generation beyond boilerplate. Math past elementary school. Frontier knowledge (anything from the last 12 months).
The right use cases
Assistant features, journal structuring, private chat, search re-ranking. The pattern is: specific, bounded, grounded in user data.
Where the cloud still wins
Anything frontier — the smartest model, the longest context, the most recent data. For those tasks, use cloud AI consciously, with a privacy-aware provider, and only with data you're willing to send.
About Sovereign — A privacy-first AI personal assistant that runs entirely on your iPhone. On-device LLM, zero-knowledge encryption, and a coach that learns from your own words. See how it works or visit the homepage.