Why testing strategy matters
- Speed without safety is debt. Safety without speed kills throughput.
- Enterprise teams need a strategy that prevents regression, scales to many developers, and remains economical to maintain.
- This guide outlines a layered test strategy, CI gates, flake reduction, and patterns that have worked in large Flutter orgs.
Objectives
- Fast feedback for the majority of changes (seconds to minutes)
- High confidence for money paths before shipping
- Tests that are resilient to refactors and UI churn
- Clear ownership and reporting to enable accountability
Testing pyramid (Flutter flavor)
- Unit tests (70–80% of volume): pure Dart, business logic, reducers, formatters, mappers.
- Widget tests (15–25%): component behavior in isolation, edge cases on layout/state transitions.
- Integration/E2E (2–5%): critical flows on real devices (profile/release), minimal and stable.
- Contract tests (selective): repository ↔ API schemas; serialization; feature boundary contracts.
Unit tests: make them carry the weight
- Target pure Dart modules (domain, utils, data mappers).
- Avoid using Flutter Test unless necessary—Dart test runner is faster.
- Design for testability:
- Dependency inversion for time/UUID/random/network.
- Pure functions for reducers; deterministic inputs.
- Property-based tests for core invariants (validation, pricing, merging).
Widget tests: behavior not pixels
- Focus on behavior: interactions, validation messages, accessibility semantics, state transitions.
- Prefer test doubles for providers/blocs/controllers.
- Golden tests: use sparingly; only for critical brand visuals with stable design tokens.
- Anti-pattern: brittle tests that assert deep widget trees instead of visible behavior.
Integration tests: small, stable, production-like
- Run on real devices or emulators in profile/release mode where possible.
- Scope:
- Authentication happy path
- Purchase/checkout or other revenue path
- Sync a batch of edits offline → online
- Keep the suite intentionally small (< 10 flows). The value is signal, not coverage.
Designing for testability
- Separation of concerns:
- UI: stateless as far as possible; ephemeral state only for controllers.
- State: BLoC/Riverpod/Controller in feature boundaries; deterministic I/O through interfaces.
- Domain: pure; no Flutter imports.
- Data: repositories behind interfaces; HTTP/DB plugged via DI.
- Deterministic clocks, IDs, and random:
- Inject Clock, IdGenerator, Random so unit tests can control them.
- Feature boundaries export “UseCases” or Controllers; avoid leaking internals out.
Test doubles and fakes
- Use hand-written fakes for repositories/services with realistic behavior (delays, errors, pagination).
- Prefer fakes to mocks for complex flows (mocks become unreadable quickly).
- Snapshot goldens for network payloads: store minimal canonical fixtures.
Flake reduction playbook
- Stabilize setup/teardown: ensure environment is quiescent before assertions (await animations, idles).
- Explicit time control: FakeAsync or injected clock for debounce and timers.
- Unique test data: random suffixes (seeded) to avoid collisions across parallel runs.
- Retries only as last resort; fix root cause. If needed, quarantine flaky tests and create a ticket with owner and SLA.
Coverage targets (pragmatic)
- Unit: aim for 80% of domain and mappers; coverage is a heuristic, not a goal.
- Widgets: target critical components; measure line + branch on controllers.
- Integration: don’t chase coverage; enforce presence of money-path flows.
CI pipeline and gates Stages (example):
- Lint & format
- dart analyze, flutter analyze, and format checks
- Build fast checks
- Unit tests (Dart), widget tests (Flutter) in parallel shards
- Enforce failing fast
- Integration on device farm
- Run critical flows on Android+iOS; fail gate on regressions
- Artifact and size budgets
- Enforce APK/IPA size thresholds; alert when exceeded
- Release gates (staging)
- Upload to TestFlight/Internal App Sharing; smoke tests and automatic rollback criteria
Parallelization and sharding
- Split unit/widget test suites by package/feature; run in parallel runners.
- Cache pub/Gradle/Xcode derived data.
- Use Melos or custom scripts to target changed packages/features.
Test data management
- Deterministic seeds (e.g., faker with a fixed seed) for reproducible tests.
- Fixtures controlled in a single place; contract-tested against API schemas.
- Avoid fixtures that are too broad; keep focused minimal fixtures.
Accessibility and i18n tests
- Accessibility: assert Semantics for core paths (labels, roles, focus traversal).
- i18n: snapshot a subset of screens in two extra locales (e.g., de, ar) to catch overflow/RTL issues.
Security and privacy checks
- Static analysis to block accidental logging of PII.
- Verify secure storage flows for tokens and proper keychain/keystore usage.
- Ensure TLS pinning policies (if used) are testable and gated.
Sample structure
/packages
/feature_checkout
/lib
/test
unit/
price_calculator_test.dart
discounts_reducer_test.dart
widget/
checkout_form_test.dart
payment_button_state_test.dart
/domain_checkout
/test
unit/
totals_property_test.dart
/app
/integration_test
checkout_flow_test.dart
offline_sync_recovery_test.dart
Example: testing a debounced search controller (Riverpod)
final queryProvider = StateProvider<String>((ref) => '');
final resultsProvider = FutureProvider.autoDispose((ref) async {
final query = ref.watch(queryProvider);
if (query.isEmpty) return [];
await Future<void>.delayed(const Duration(milliseconds: 300));
return searchApi(query);
});
Test strategy:
- Inject a fake search API that records calls and returns fixtures.
- Control time with FakeAsync or injected clock; advance 300ms to assert execution.
- Assert no extra calls on quick successive updates (debounce honored).
Governance and ownership
- Each feature team owns their tests and CI health.
- Weekly quality review: flaky test dashboard, slowest tests, top failures.
- Incident process for quality: if a production issue slips, add a test and document root cause.
Adoption plan (incremental)
- Establish CI gates: lint + unit + widget on PRs.
- Add 2–3 money-path integration tests on device farm.
- Introduce property-based tests for core domain rules.
- Enable size budgets and accessibility semantics checks on core screens.
- Quarterly review: remove brittle tests, add coverage where gaps hurt.
Conclusion A lean, layered strategy focuses tests where they yield the most confidence per minute. Keep unit tests dominant, widget tests targeted to behavior, and only a handful of integration flows. Treat flakes as incidents, automate prevention, and continuously invest in testability by design.