Offline‑First Flutter Architecture and Sync: Design, Conflicts, and Guarantees

3/14/2025
ZK

Written by Zayin Krige

Flutter developer and founder of Apex Technology

Why offline‑first?

  • Modern users expect apps to work on flaky networks, in airplane mode, and in low‑signal areas.
  • Offline‑first isn’t just caching; it’s a deliberate data model, write semantics, and conflict strategy.
  • Getting it right reduces support load, increases retention, and simplifies UX: users never see “can’t save”.

This guide provides a comprehensive blueprint for offline‑first Flutter apps, with patterns for storage, sync, conflicts, background jobs, and testing.

Contents

  • Principles and guarantees
  • Data model and identifiers
  • Local storage and change log
  • Write semantics: queueing, retries, backoff
  • Sync strategies (pull/push/merge)
  • Conflict resolution patterns
  • Background syncing and lifecycle
  • UX patterns for trust and clarity
  • Observability, metrics, and alerting
  • Testing and failure injection
  • Reference implementation outline

Principles and guarantees

  • Availability first: operations succeed locally, even when offline.
  • Convergence: replicas eventually reach a consistent state under no further updates.
  • Idempotence: re‑applying the same operation yields the same state; critical for retries.
  • Causal ordering (where needed): maintain relative order within an entity/stream.
  • Bounded staleness: define acceptable windows for how stale UI can be.

Guarantee matrix (what you can promise)

  • Local durability: once the user sees “Saved”, the mutation exists in the local log (fsync’ed).
  • Eventual server persistence: queued writes are retried with exponential backoff until success or manual intervention.
  • Deterministic conflict policy: given concurrent edits, the outcome is predictable, documented, and test‑covered.

Data model and identifiers

  • Use client‑generated, globally unique IDs (UUID v4 or KSUID) for new entities.
  • Prefer immutable event stream (append‑only change log) over in‑place mutation for local writes; apply reducers to compute current state.
  • Store server ETags/versions per entity to support optimistic concurrency.

Entity shape (example)

{
  "id": "ksuid_29JYBqz9G3S6po1hy...",
  "type": "Note",
  "title": "Grocery list",
  "body": "Milk, eggs, butter",
  "updatedAt": "2025-03-14T09:12:00Z",
  "version": "W/\"etag-server-v15\"",
  "local": {
    "pending": false,
    "dirtyFields": ["body"],
    "lastSyncAt": "2025-03-14T09:15:00Z"
  }
}

Local storage and change log

  • Chosen stores:
    • drift (SQLite) for relational/stateful queries and constraints
    • isar/hive for pure KV or document workloads
  • Tables:
    • entities (current materialized state)
    • changes (append‑only operations: create/update/delete, payload, timestamp, dependency)
    • sync_state (cursors for pull, last_success, backoff state)
  • Keep indexes on entity id, updatedAt, and dirty flags for efficient queries.

Write semantics: queueing, retries, backoff

  • Local flow:
    1. Validate input in domain layer.
    2. Begin transaction:
      • Append change to changes
      • Apply change to entities materialized view
    3. Mark entity local.pending=true and capture dirtyFields.
    4. Enqueue a background job to push.
  • Retries:
    • Exponential backoff with jitter (e.g., 1s, 2s, 4s, 8s, cap at 5m) on transient errors.
    • Immediate retry on network regain.
  • Idempotence:
    • Include clientChangeId in payload; server stores last applied id per entity to de‑duplicate.

Sync strategies

  • Pull (read):
    • Incremental with server cursor (updatedSince watermark or opaque cursor).
    • On first run: full sync per collection, then incremental.
    • Merge strategy avoids clobbering local unsynced changes; consider three‑way merges on fields.
  • Push (write):
    • Order by dependency (create before update; parent before child when needed).
    • Batch small operations to reduce overhead but keep batches small enough for rollback handling.
    • On 409 Conflict / 412 Precondition Failed, trigger conflict resolution.

Conflict resolution patterns

  • Last‑Writer‑Wins (LWW): compare server updatedAt vs. client updatedAt.
    • Pros: simple, deterministic; Cons: may clobber legitimate changes.
  • Field‑level merge: merge only independent fields; conflicting fields use LWW or custom policies.
  • Operational transforms (OT)/CRDT: for collaborative text or counters; higher complexity, powerful guarantees.
  • Manual resolution: queue conflict items for user attention (notifications, “Resolve” screens).
  • Recommendation:
    • Start with field‑level merge for independent fields, LWW fallback for conflicts, and manual surface for critical entities.

Example: optimistic concurrency with ETag

  • Client sends If-Match: W/"etag-server-v15".
  • On mismatch (412), fetch server version, compute three‑way diff:
    • Base: last known server version
    • Local: user‑edited version
    • Remote: current server version
  • Merge:
    • If fields disjoint → merge automatically.
    • Else if resolvable by policy → apply.
    • Else create a “conflict item” for manual resolution.

Background syncing and lifecycle

  • Architecture:
    • Foreground sync manager (Dart) for active app.
    • Background fetch (platform): Android WorkManager; iOS BGAppRefreshTask/BGProcessingTask with limits.
  • Triggers:
    • App start (deferred by a few seconds to avoid blocking UX)
    • Connectivity changes (network regained)
    • Significant time elapse (e.g., every 15–30 min)
    • Manual pull‑to‑refresh
  • Guardrails:
    • Respect battery saver, unmetered networks options, and OS quotas.
    • Rate‑limit to avoid server overload; server should also enforce quotas.

UX patterns for trust and clarity

  • Optimistic UI: reflect local changes immediately with subtle “Syncing…” status.
  • Sync banners/toasts: “You’re offline. Changes will sync when connection resumes.”
  • Conflict chips: indicate conflicts on list rows; tap → resolution screen.
  • Disabled destructive actions when not possible safely (e.g., permanent delete without server ack).
  • Visual status on settings: last sync time, queued ops count, error list (with resend).

Observability, metrics, and alerting

  • Client metrics:
    • Queue length over time
    • Sync cycle duration and success rate
    • Conflict rate per collection/field
    • Retry histogram and backoff distribution
  • Logging:
    • Redact PII; include change ids, entity ids, and error codes.
    • Sample to avoid volume spikes.
  • Alert thresholds (analytics backend):
    • Conflict rate > X% → alert platform squad
    • Retry > N attempts → circuit‑break or escalate UI

Testing and failure injection

  • Unit tests:
    • Reducers (applyChange), merge policy, clock‑controlled timestamping.
  • Integration tests (widget + persistence):
    • End‑to‑end mutation flows with forced offline/online transitions.
  • Failure injection:
    • Simulate 409/412/5xx, timeouts, and partial batch failures.
    • Use dependency injection to swap HTTP client and persistence.
  • Property‑based tests:
    • Generate random interleavings of changes/pulls to verify convergence and invariants.

Reference implementation outline (pseudo)

// Domain change model
sealed class Change {
  final String id; // clientChangeId
  final String entityId;
  final DateTime at;
  const Change(this.id, this.entityId, this.at);
}
class UpdateFields extends Change {
  final Map<String, dynamic> patch;
  const UpdateFields(super.id, super.entityId, super.at, this.patch);
}

// Persistence interface
abstract class ChangeStore {
  Future<void> append(Change c);
  Stream<List<Change>> pending();
  Future<void> markApplied(String changeId, {required String serverVersion});
}

// Sync manager
class SyncManager {
  final ChangeStore changes;
  final EntityStore entities;
  final Api api;
  final Clock clock;
  SyncManager(this.changes, this.entities, this.api, this.clock);

  Future<void> push() async {
    final batch = await changes.pending().first; // small batch
    for (final c in batch) {
      try {
        final res = await api.apply(c);
        await changes.markApplied(c.id, serverVersion: res.etag);
        await entities.updateVersion(c.entityId, res.etag);
      } on PreconditionFailed catch (e) {
        await _resolveConflict(c, e.serverEntity);
      } on TransientError {
        // leave in queue; retry later with backoff
        return;
      }
    }
  }
}

When not to go offline‑first

  • Real‑time transactional integrity (e.g., stock trades) with strong consistency might not fit offline semantics.
  • If product does not require offline and infra budget is minimal, choose simpler caching.

Conclusion Offline‑first is an architectural choice that must permeate data modeling, UI, and operational practices. With a change log, idempotent writes, measured retries, and explicit conflict policies, you can deliver a robust experience that users trust—on any network.