Mobile Browser Extensions for On-Device AI: Building a Plugin for Puma-style Local Models
MobileExtensionsAI

Mobile Browser Extensions for On-Device AI: Building a Plugin for Puma-style Local Models

UUnknown
2026-03-09
10 min read
Advertisement

Practical guide to building a plugin ecosystem for local-AI mobile browsers—APIs, permissions, UX, and packaging for Android and iOS in 2026.

Ship a plugin ecosystem for a Puma-style mobile browser with local AI — without breaking user privacy or App Store rules

Pain point: you want to extend a mobile browser that runs local AI models (Puma-style) with third-party plugins, but you’re blocked by platform sandboxing, model APIs, UX expectations, and permissions. This guide gives you a practical blueprint — APIs, UI surfaces, permission models, packaging patterns for Android and iOS, and security practices you can implement in 2026.

Why this matters in 2026

On-device AI matured fast in late 2024–2025: 4-bit and 8-bit quantization, mobile GPU/NPUs, portable formats (GGML, Core ML, TFLite), and browser WebGPU support enabled low-latency, private inference on phones. Browser-first local-AI projects like Puma showed users prefer private, offline assistants embedded in the browser. That opened a new market for plugins that run locally, enhance pages, and integrate with personal models — but building a safe plugin ecosystem is hard.

High-level architecture: plugin host, plugin sandbox, and model API

Design your architecture with clear separation of concerns:

  • Host (browser): loads plugins, enforces permissions, exposes validated APIs, manages plugin lifecycle, and hosts the local model runtime.
  • Plugin sandbox: runs plugin UIs and business logic in a constrained environment (JS sandbox, WASM, or isolated process) with capability-limited APIs.
  • Model runtime / Local AI process: a native process or library that serves model inference APIs (HTTP/gRPC/WASM) and enforces resource budgets.

Why separate the model runtime

Separating the runtime lets you control GPU/NPU access, memory quotas, and model selection centrally. Plugins request inference via capability-limited APIs, not by loading models directly — this prevents model exfiltration and keeps resource usage predictable.

Plugin capability and permission model

Design a principle of least privilege and make permissions explicit to users. Use capability-based tokens that the host issues to plugins at install time. Each capability should map to one or more runtime or platform permissions.

Example capability matrix

  • model.infer: access to local model inference API (quota-limited, non-network)
  • clipboard.read/write: clipboard access
  • tab.read: read-only page content access
  • tab.modify: inject overlays or edit DOM
  • microphone: access to real-time audio capture
  • file.storage: read/write to plugin-scoped storage

Grant capabilities by user consent and show a clear, contextual prompt when the plugin first requests it. Persist consents and provide a single settings screen to revoke capabilities.

Model API: design for privacy, latency, and streaming

Provide a simple REST/gRPC JSON API that plugin code calls. Keep the API consistent between Android and iOS by providing language-agnostic endpoints (local HTTP) and platform-native bridges.

Minimal model API (HTTP JSON)

{
  "POST": "/v1/complete",
  "headers": { "Authorization": "Bearer " },
  "body": {
    "model": "puma-7b-q4",
    "input": "Summarize this page",
    "max_tokens": 256,
    "stream": true
  }
}

Key design points:

  • Capability token — Map tokens to the plugin ID and capabilities.
  • Streaming — Support SSE or chunked responses for low-latency UX.
  • Quota/limits — Per-plugin memory and compute budgets; deny or queue heavy requests.
  • Model selection — Allow host to expose a curated model list with metadata (size, quantization, privacy level).

Example SSE streaming response

data: { "delta": "This is" }
data: { "delta": " an incremental" }
data: [DONE]

Plugin manifest and packaging

Your plugin package describes identity, capabilities, UI surfaces, and assets. Keep it small and declarative.

Example manifest.json

{
  "name": "SummarizeThis",
  "version": "0.1.0",
  "publisher": "acme.dev",
  "description": "Summarize web pages using local models",
  "capabilities": ["model.infer","tab.read"],
  "ui": {
    "toolbarButton": {"icon": "icon.png","title": "Summarize"},
    "contextMenu": {"title": "Summarize selection"}
  },
  "entry": "index.js",
  "signature": ""
}

Packaging format options:

  • Zip bundle with manifest, JS/WASM, icons — simple and cross-platform.
  • Signed bundle — host verifies developer signature at install time.
  • Platform-specific wrappers — Android AAB dynamic features or iOS app extensions for tighter integration.

Loading plugins on Android and iOS — platform rules and practical patterns

Platform constraints are the most common stumbling block:

iOS: No downloadable native code — use JS/WASM or pre-approved extensions

Apple’s App Store policies still disallow downloading and executing arbitrary native code in 2026. To stay compliant while enabling plugins:

  • Allow plugin JavaScript + WASM: load JS and WASM at runtime; host executes them in a sandboxed JS engine (WKWebView or JavaScriptCore).
  • Pre-bundle optional native features: for high-trust plugins, include native frameworks inside the app and map to plugin IDs (review risk and App Store policy).
  • Use app extensions for OS-level integration where necessary, but they must be bundled in the app.

iOS example: communicating to the model runtime (Swift)

class PluginBridge: NSObject, WKScriptMessageHandler {
  func userContentController(_ uc: WKUserContentController, didReceive message: WKScriptMessage) {
    guard let body = message.body as? [String:Any] else { return }
    let route = body["route"] as? String
    // Forward to local HTTP model API running inside the app
  }
}

Android: more flexible, but secure bridging is key

Android allows more flexibility: you can ship dynamic feature modules or download plugin bundles, but you must still enforce sandboxing and signature checks.

Android example: addJavascriptInterface (Kotlin)

class PluginBridge(private val context: Context) {
  @JavascriptInterface
  fun callModel(jsonRequest: String): String {
    // Forward request to local model runtime over loopback HTTP
    return "{\"status\":\"queued\"}"
  }
}
webView.addJavascriptInterface(PluginBridge(this), "PluginBridge")

Be aware: addJavascriptInterface has security pitfalls. Only expose tightly-scoped methods and validate inputs. Prefer message-passing APIs that keep binary data off the JS bridge.

UI surface patterns for mobile browser plugins

Design consistent, low-friction UI surfaces so plugins feel native and predictable. Prioritize quick, context-aware helpers over full-screen apps.

Common UX surfaces

  • Toolbar button — one-tap entry to plugin actions. Ideal for single-purpose plugins (summarize, translate).
  • Selection context menu — operate directly on highlighted text or media.
  • Side sheet / overlay — ephemeral panel for richer interactions; should be dismissible and not cover essential content.
  • Inline injection — small, contextual annotations inserted into the DOM (use sparingly and opt-in).
  • Command palette / omnibox — quick keyboard-driven access to plugin commands (power users).

State and lifecycle guidelines

  • Keep plugin state isolated and persistent only in plugin-scoped storage.
  • Stop long-running work when the user navigates away.
  • Show progress and cancel affordances for model inference (timeouts, abort controllers).

Security hardening and privacy best practices

Security is the core differentiator of local-AI browsers. Implement layered defenses:

  1. Signature verification: Require developer signatures for plugin bundles and verify at install.
  2. Capability tokens: Issue short-lived, scoped tokens for model and platform APIs.
  3. Audit logs: Record plugin API calls locally; allow users to inspect plugin activities.
  4. Network controls: By default, disallow outbound network from plugins that have model.infer; require explicit capability for network access (and disclose it).
  5. Resource quotas: Enforce memory, inference time and concurrency limits; throttle or sandbox heavy models.
  6. Privacy disclosure: Show a concise explanation of what data is used locally, what (if anything) is sent off-device, and how to opt out.

Packaging & distribution strategies

There are three main distribution models to consider:

  • In-app marketplace: The safest and most controlled. Host signed plugin bundles in your own marketplace and vet developers. Works well for iOS App Store compliance.
  • Direct install via bundle: Allow users to install zip bundles from disk or a URL. Must still verify signatures and show permission prompts.
  • Enterprise or developer mode: For third-party developers and power users, support a developer mode with explicit warnings and additional logging.

Android-specific options: ship plugins as Dynamic Feature Modules (DFM) for tighter integration, or keep them as JS/WASM zip bundles stored in the app's private storage.

Model packaging and updates

Models are large and user expectations for privacy and offline usage are high. Design a model lifecycle:

  1. Ship a small default model inside the app for instant use.
  2. Allow optional model downloads (user consent, show disk cost).
  3. Support model variant selection (tiny/fast vs high-quality) with on-device conversion pipelines.
  4. Use delta updates and chunked downloads to minimize bandwidth.

Model formats to support in 2026:

  • GGML / Q* quantized binaries — extremely common for local LLMs.
  • Core ML (.mlmodelc) — best for Apple devices using NN accelerators.
  • TFLite / NNAPI — cross-platform mobile inference.
  • WASM runtimes — safe to execute on iOS as they don’t violate code-execution rules.

Developer experience: SDKs, docs, and testing

To attract builders, provide:

  • Lightweight JS SDK that abstracts message passing and model API calls.
  • Native helper libraries for Android (Kotlin) and iOS (Swift) that wrap permission and capability flows.
  • Local test harness that runs the host and a mock model runtime so plugins can be tested without a physical device.
  • Policy guide explaining allowed behaviors, privacy rules, and UI templates for permission prompts.

Example JS SDK usage

// index.js inside plugin bundle
const host = window.BrowserHost; // injected host SDK
async function summarizeSelection() {
  const selection = await host.tab.readSelection();
  const resp = await host.model.complete({ model: 'puma-7b-q4', input: selection, stream: true });
  resp.on('data', chunk => updateUI(chunk));
}

Operational recommendations and monitoring

Monitor the ecosystem without compromising privacy:

  • Collect only anonymized metrics (crash rates, plugin install counts, inference latency).
  • Allow opt-in telemetry for developers to debug issues.
  • Maintain an automated vulnerability scanning pipeline for uploaded plugin bundles (WASM/JS static analysis).

Plan for these 2026+ trends:

  • Model shards & streaming load: on-device models will be streamed or sharded to balance storage and latency.
  • Federated personalization: plugins may request localized personalization weights — support safe, opt-in federated update APIs.
  • Hardware abstractions: expect more uniform NPU APIs across vendors; design runtime abstraction layers.
  • WASM + WebGPU: Running inference via WASM accelerated by WebGPU becomes a viable cross-platform option.

Quick checklist to get started (actionable)

  1. Define your plugin manifest and capability model.
  2. Implement the local model runtime with a stable HTTP JSON API and streaming support.
  3. Build a JS SDK to expose host APIs to plugins and hide platform differences.
  4. Implement signature verification and per-plugin capability tokens in the host.
  5. Create a default set of curated models and a safe model-download flow with quotas.
  6. Design permission UX: contextual prompts, settings page, and audit logs.
  7. Test on device: Android (DFM, WebView) and iOS (WKWebView, JavaScriptCore/WASM). Validate App Store compliance for iOS workflows.

Rule of thumb: prefer JavaScript/WASM plugins for maximum portability and App Store compliance; allow native features only when pre-bundled and reviewed.

Case study: SummarizeThis (minimal viable plugin)

Implementation notes for a quick demonstration:

  • Packaging: zip with manifest.json, index.js, icon.png.
  • Capabilities: tab.read, model.infer.
  • UI: toolbar button + side sheet that streams summaries using SSE.
  • Security: signed bundle + host issues a one-hour capability token scoped to model.infer.

Developer flow: plugin calls host.tab.readSelection(); host returns selection; plugin calls host.model.complete(); host forwards to the runtime and streams results back. All operations logged locally.

Conclusion and next steps

Building a plugin ecosystem for a Puma-style local-AI browser is achievable in 2026 if you design for platform constraints, privacy, and developer ergonomics from day one. The architecture outlined here—separate model runtime, capability tokens, signed bundles, JS/WASM-first plugins, and a clear permission UX—gives you a practical roadmap to ship quickly while minimizing risk.

Actionable next step: prototype a minimal host + runtime that serves /v1/complete on loopback, then build a tiny JS plugin that uses only tab.read and model.infer. Test on an Android device and an iPhone to validate bridging patterns and UX flows.

Want a starter repo, manifest templates, or SDK snippets tuned for Puma-like browsers? Reach out on thecode.website or check the developer docs to download the starter kit and join our developer preview program.

Call to action: Start building — scaffold a plugin in under an hour using the checklist above. Share your plugin idea and we’ll help map it to a secure capability set.

Advertisement

Related Topics

#Mobile#Extensions#AI
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-09T11:30:27.535Z