The on-device LLM field in 2026 has matured. It is no longer experimental, but it is not undifferentiated either. Here is the current landscape.

The top tier for mobile

Gemma 2 (2B and 9B), Phi-4 (14B, for larger phones), Llama 3.2 (1B and 3B), Qwen 2.5 (1.5B-7B). Each has a niche; Gemma is particularly strong on instruction-following for assistant workloads.

What they handle well

Summarization. Classification. Entity extraction. Sentiment and mood tagging. Short-form rewriting. Conversation with explicit system prompts.

What they still struggle with

Long-context reasoning (>8k tokens is painful). Multi-step agentic workflows. Code generation beyond boilerplate. Math past elementary school. Frontier knowledge (anything from the last 12 months).

The right use cases

Assistant features, journal structuring, private chat, search re-ranking. The pattern is: specific, bounded, grounded in user data.

Where the cloud still wins

Anything frontier — the smartest model, the longest context, the most recent data. For those tasks, use cloud AI consciously, with a privacy-aware provider, and only with data you're willing to send.

About Sovereign — A privacy-first AI personal assistant that runs entirely on your iPhone. On-device LLM, zero-knowledge encryption, and a coach that learns from your own words. See how it works or visit the homepage.

Local LLMs: State of the Art in 2026

The top tier for mobile

What they handle well

What they still struggle with

The right use cases

Where the cloud still wins

Keep reading

How Gemma Runs on Your iPhone (Without Eating the Battery)

Flutter vs Swift for Privacy-First iOS Apps

Apple MLX vs TensorFlow Lite: Which Should You Target?

The private AI that runs on your phone