Build Resilient Microapps: Architectures That Survive CDN and Cloud Provider Outages
resiliencearchitecturedeployment

Build Resilient Microapps: Architectures That Survive CDN and Cloud Provider Outages

tthecode
2026-01-31 12:00:00
9 min read
Advertisement

Patterns and a reference architecture to keep microapps usable during CDN and cloud outages — multi-CDN, multi-cloud, edge caching, and Raspberry Pi fallbacks.

When a CDN or cloud provider goes down, your microapps shouldn't become zombies

Major outages from Cloudflare, AWS, and other providers in late 2025 and early 2026 showed one thing clearly: even professionally hosted web microapps can fail in ways that break customers' workflows. If you manage microservices, internal tools, or customer-facing microapps, this guide gives you pragmatic, production-ready patterns — multi-CDN, multi-cloud, edge caching, and graceful degradation — plus a compact reference architecture that keeps functionality alive during major internet incidents.

Key takeaways

  • Design microapps for partial failure: prefer predictable degraded UX over full downtime.
  • Multi-CDN and multi-cloud reduce blast radius but add operational complexity — automate it.
  • Edge caching with stale-while-revalidate and ESI keeps UI responsive during origin failures.
  • Local fallback (Raspberry Pi 5 + AI HATs) and Service Worker offline modes enable internal continuity when the public internet is impaired.

Why this matters in 2026

Outages in 2025–2026 highlighted single-vendor dependency risks. At the same time, edge compute and CDN mesh services matured: Workers and edge functions are now first-class for production logic, and QUIC/HTTP3 adoption is mainstream. Organizations increasingly combine cloud providers, CDNs, and on-prem edge devices to guarantee continuity for critical microapps.

  • CDNs offering integrated load balancing and edge compute (Cloudflare, Fastly, Akamai). Expect to use edge logic for caching and graceful responses.
  • Multi-cloud Kubernetes patterns and thin control planes (Kubernetes federation is less common; app-level replication is preferred).
  • Stronger routing security (RPKI) and more frequent BGP anomalies — make DNS and BGP-based fallback policies resilient.
  • Affordable single-board computers (Raspberry Pi 5 + AI HATs) are viable local edge nodes for offline caches and background sync.

Core resilience patterns

1. Multi-CDN (Active-Active / Active-Passive)

Use two or more CDNs so your edge layer doesn't have a single vendor failure mode. Two operational patterns work well:

  • Active-active: Use DNS-based traffic steering or a CDN mesh product to split traffic across CDNs. Requires synchronized cache behavior and consistent headers.
  • Active-passive: Primary CDN handles traffic; passive CDN only receives traffic on failover. Easier to test and cheaper but slower to failover in some DNS TTL scenarios.

Implementation tips:

  • Use HTTP health checks and real-user monitoring to trigger failover — not just synthetic ping.
  • Propagate consistent cache-control headers and origin-shielding rules to all CDNs.
  • Test failover monthly with simulated outages and postmortems.

2. Multi-cloud origins

Keep at least two independent origin deployments in different clouds or regions. Patterns:

  • Geo-distributed symmetric origins — each origin can serve full traffic (requires data replication / eventual consistency).
  • Primary origin with warm failover — secondary is ready but receives little traffic until failover.

For data: choose conflict-tolerant replication (CRDTs, anti-entropy syncing) or push noncritical writes to a queue to be reconciled. For microapps, preferring read-mostly flows simplifies replication.

3. Smart edge caching

Edge caching is your first line of defense when origins are unreachable.

  • Use stale-while-revalidate and stale-if-error to keep serving slightly stale content when origin latency spikes or fails.
  • Leverage ESI (Edge Side Includes) to keep critical UI components cacheable while still rendering dynamic bits server-side.
  • Cache API responses for safe durations; make dynamic operations idempotent or queue them.

4. Graceful degradation and offline modes

Accept that in some outages you can’t provide full features. Plan for predictable degraded UX:

  • Skeleton screens and read-only views for content-driven microapps.
  • Background sync and queues for forms and writes (Service Worker + IndexedDB).
  • Local-first architecture for internal tools: keep essential logic on-device or on-prem with periodic reconciliation.

5. Local edge fallback (Raspberry Pi and on-prem caches)

Raspberry Pi 5 and the new AI HATs (2025–2026) make robust local caches practical. Use Pi nodes as:

  • LAN cache/origin for critical microapps (NGINX or Caddy reverse proxy with cached bundles).
  • Local auth and sync bridge when public identity providers are unreachable — see Edge Identity Signals for operational patterns.
  • Private CDN for remote sites (edge nodes sync from cloud when available).

Advantages: offline continuity, faster UX on flaky WAN links, and less blast radius during global CDN outages.

Reference architecture: resilient microapps for major outages

Below is a compact, deployable architecture that balances complexity, cost, and resilience. It targets microapps that need to remain usable (read and limited write) during large provider outages.

Components

  1. Client: Progressive web app with Service Worker, IndexedDB, and UI-level graceful degradation.
  2. Edge layer: Multi-CDN (primary + secondary) with identical caching rules and edge functions for validation/fallback.
  3. Origins: Multi-cloud origins (AWS + GCP/Azure or cloud + on-prem) with a lightweight API gateway and shared data replication or queues.
  4. Local edge nodes: Raspberry Pi 5 devices running reverse proxies and local sync services for offline mode.
  5. Control plane: DNS (low TTL or DNS-based failover), health checks, and orchestrated failover via Terraform / CI pipelines.
  6. Monitoring: RUM, synthetic checks across ISPs, distributed tracing, and incident automation (PagerDuty, Slack).

How traffic flows during normal operation

  1. Client requests go to the multi-CDN edge; edge serves cached assets or forwards to the primary origin.
  2. Origins respond and edge revalidates caches (stale-while-revalidate).
  3. Writes go to the API layer; if write fails, Service Worker queues the request in IndexedDB and retries.

How it behaves during CDN/cloud outage

  • If a CDN edge disappears, traffic shifts to the other CDN via DNS or an active-active mesh.
  • If both CDNs lose origin access, edge will serve stale content (stale-if-error) and edge functions will return lightweight read-only or cached responses.
  • If the public cloud origins are unreachable from the internet, local Raspberry Pi nodes serve critical assets to LAN clients and accept queued writes to be synced after restore.

Implementation guide — practical steps

Step 1: Define critical features and SLOs

  • Classify microapp features: must-work-offline, must-be-consistent, best-effort.
  • Set SLOs for availability of core flows (login read-only view 99.95% on global scale).

Step 2: Standardize headers and caching

Ensure all static and API responses include consistent cache policies:

Cache-Control: public, max-age=60, stale-while-revalidate=300, stale-if-error=86400

Use Vary headers carefully and minimize header variance across CDNs.

Step 3: Implement Service Worker offline-first strategy

Use a cache-first strategy for static assets and network-first with fallback for APIs. Example Service Worker snippet:

// service-worker.js
self.addEventListener('fetch', event => {
  const url = new URL(event.request.url);

  // Static assets: cache-first
  if (url.pathname.startsWith('/static/') || event.request.destination === 'image') {
    event.respondWith(caches.match(event.request).then(res => res || fetch(event.request)));
    return;
  }

  // API: network-first with IndexedDB fallback
  if (url.pathname.startsWith('/api/')) {
    event.respondWith(
      fetch(event.request)
        .then(resp => { cacheApiResponse(event.request, resp.clone()); return resp; })
        .catch(() => getCachedApiResponse(event.request))
    );
  }
});

Implement cacheApiResponse and getCachedApiResponse using IndexedDB. Keep writes queued with Background Sync — see our micro-app builder reference for practical examples: Build a Micro-App Swipe in a Weekend.

Step 4: Configure CDNs and failover rules

  • Provision two CDNs and align cache rules and TLS certificates (use ACME + automation).
  • Use a DNS provider that supports fast failover and health checks (Route53, NS1, or Cloudflare DNS). Keep DNS TTLs low for dynamic failover.
  • For active-active, use weighted DNS or a managed multi-CDN service to distribute traffic; include health-based steering.

Step 5: Deploy multi-cloud origins

Deploy identical API stacks in two clouds. Use brokered message queues (e.g., managed Kafka or replicated queues) or push noncritical writes into durable storage to be reconciled. Keep authentication independent of a single provider (multi-idp or local auth fallback).

Step 6: Add Raspberry Pi local edge nodes

On each critical site, place a Pi 5 running a small stack:

sudo apt install nginx git
# nginx acts as reverse proxy and local cache
# sync a subset of assets via rsync or an incremental sync tool
rsync -avz origin:/var/www/microapp/static/ /srv/microapp/static/

Use a lightweight sync service (lsyncd or Syncthing) to pull from cloud origins when available. Create a local DNS entry or use mDNS to direct clients to the local Pi when public routing fails.

Step 7: Observability and testing

  • Implement RUM and synthetic checks from multiple regions/ISPs, including tests that emulate CDN/Origin failure.
  • Automate chaos tests: disable CDN A, then CDN B, then origin, and verify user flows — treat these like red-team pipeline drills: red-team supervised pipelines.
  • Track error budgets and set automated rollbacks for risky deploys.

Example NGINX fallback config for Raspberry Pi edge

server {
  listen 80;
  server_name microapp.local;

  location /static/ {
    root /srv/microapp;
    try_files $uri $uri/ =404;
    expires 1d;
    add_header Cache-Control "public";
  }

  location /api/ {
    proxy_pass https://primary-origin.example.com/api/;
    proxy_next_upstream error timeout http_502 http_504;
    proxy_cache microapp_api_cache;
    proxy_cache_valid 200 1m;
    proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
  }
}

Note the use of proxy_cache_use_stale to serve stale API responses if the origin is failing — this keeps the app responsive in outage windows. For proxy management and small-team observability patterns, see: Proxy Management Tools for Small Teams.

Tradeoffs and costs

  • Complexity: Multi-CDN and multi-cloud increase ops overhead. Invest in automation (Terraform, CI/CD) and runbooks.
  • Cost: Secondary CDN/origin and Raspberry Pi fleet have marginal costs but dramatically reduce incident impact for critical apps.
  • Consistency: Strong consistency across clouds is expensive. Favor eventual consistency or queue-based write models for microapps that can tolerate it.

Monitoring, SLOs, and runbooks

Create runbooks for every failure mode and automate failover when safe. Track these metrics:

  • Edge hit ratio, stale-serving events, API error rate per CDN.
  • DNS failover events and health-check flaps.
  • Queued write backlog on devices and sync success rates.

Future-proofing: 2026 and beyond

Expect these developments through 2026 and use them to refine your architecture:

  • CDN meshes and standardized multi-CDN APIs will reduce integration friction. Plan to adopt CDN orchestration platforms.
  • Edge compute will support more app logic; move non-critical compute to edge with fast fallbacks to on-device code.
  • P2P and mesh sync (WebRTC, CRDT libraries) will make local-first apps more robust. Consider local-to-local sync for office deployments.
  • Stronger routing security and observability (RPKI, BGP monitoring) will change how you detect network disasters.
Design for partial failure. In outages, a predictable degraded experience keeps customers working — and your incident postmortems short.

Actionable checklist (start today)

  1. Audit your critical microapps and define must-work-offline flows.
  2. Standardize cache headers and implement stale-while-revalidate across CDNs.
  3. Deploy a second CDN and configure health-based failover (staged rollout).
  4. Build a simple Service Worker with IndexedDB write queue and offline read cache.
  5. Pilot a Raspberry Pi edge node at one site for LAN fallback and sync testing.

Final thoughts

Major provider outages are now a recurring operational reality. The approach above — combining multi-CDN, multi-cloud, robust edge caching, and predictable graceful degradation — reduces blast radius and protects user experience. Start small: prioritize your must-work-offline microapps, add edge caching and a service worker, then layer in multi-CDN and local edges.

Call to action

Ready to harden a microapp? Start with a 2-hour checklist: align cache headers, add stale-while-revalidate, and ship a Service Worker with IndexedDB fallback. If you want a hands-on reference, download our deployable repo (Terraform + NGINX + Service Worker templates) tailored for multi-CDN / Raspberry Pi fallback — or contact our engineering team for a 1-week resilience audit.

Advertisement

Related Topics

#resilience#architecture#deployment
t

thecode

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-01-24T04:07:58.480Z