Ship a plugin ecosystem for a Puma-style mobile browser with local AI — without breaking user privacy or App Store rules
Pain point: you want to extend a mobile browser that runs local AI models (Puma-style) with third-party plugins, but you’re blocked by platform sandboxing, model APIs, UX expectations, and permissions. This guide gives you a practical blueprint — APIs, UI surfaces, permission models, packaging patterns for Android and iOS, and security practices you can implement in 2026.
Why this matters in 2026
On-device AI matured fast in late 2024–2025: 4-bit and 8-bit quantization, mobile GPU/NPUs, portable formats (GGML, Core ML, TFLite), and browser WebGPU support enabled low-latency, private inference on phones. Browser-first local-AI projects like Puma showed users prefer private, offline assistants embedded in the browser. That opened a new market for plugins that run locally, enhance pages, and integrate with personal models — but building a safe plugin ecosystem is hard.
High-level architecture: plugin host, plugin sandbox, and model API
Design your architecture with clear separation of concerns:
- Host (browser): loads plugins, enforces permissions, exposes validated APIs, manages plugin lifecycle, and hosts the local model runtime.
- Plugin sandbox: runs plugin UIs and business logic in a constrained environment (JS sandbox, WASM, or isolated process) with capability-limited APIs.
- Model runtime / Local AI process: a native process or library that serves model inference APIs (HTTP/gRPC/WASM) and enforces resource budgets.
Why separate the model runtime
Separating the runtime lets you control GPU/NPU access, memory quotas, and model selection centrally. Plugins request inference via capability-limited APIs, not by loading models directly — this prevents model exfiltration and keeps resource usage predictable.
Plugin capability and permission model
Design a principle of least privilege and make permissions explicit to users. Use capability-based tokens that the host issues to plugins at install time. Each capability should map to one or more runtime or platform permissions.
Example capability matrix
- model.infer: access to local model inference API (quota-limited, non-network)
- clipboard.read/write: clipboard access
- tab.read: read-only page content access
- tab.modify: inject overlays or edit DOM
- microphone: access to real-time audio capture
- file.storage: read/write to plugin-scoped storage
Grant capabilities by user consent and show a clear, contextual prompt when the plugin first requests it. Persist consents and provide a single settings screen to revoke capabilities.
Model API: design for privacy, latency, and streaming
Provide a simple REST/gRPC JSON API that plugin code calls. Keep the API consistent between Android and iOS by providing language-agnostic endpoints (local HTTP) and platform-native bridges.
Minimal model API (HTTP JSON)
{
"POST": "/v1/complete",
"headers": { "Authorization": "Bearer " },
"body": {
"model": "puma-7b-q4",
"input": "Summarize this page",
"max_tokens": 256,
"stream": true
}
}
Key design points:
- Capability token — Map tokens to the plugin ID and capabilities.
- Streaming — Support SSE or chunked responses for low-latency UX.
- Quota/limits — Per-plugin memory and compute budgets; deny or queue heavy requests.
- Model selection — Allow host to expose a curated model list with metadata (size, quantization, privacy level).
Example SSE streaming response
data: { "delta": "This is" }
data: { "delta": " an incremental" }
data: [DONE]
Plugin manifest and packaging
Your plugin package describes identity, capabilities, UI surfaces, and assets. Keep it small and declarative.
Example manifest.json
{
"name": "SummarizeThis",
"version": "0.1.0",
"publisher": "acme.dev",
"description": "Summarize web pages using local models",
"capabilities": ["model.infer","tab.read"],
"ui": {
"toolbarButton": {"icon": "icon.png","title": "Summarize"},
"contextMenu": {"title": "Summarize selection"}
},
"entry": "index.js",
"signature": ""
}
Packaging format options:
- Zip bundle with manifest, JS/WASM, icons — simple and cross-platform.
- Signed bundle — host verifies developer signature at install time.
- Platform-specific wrappers — Android AAB dynamic features or iOS app extensions for tighter integration.
Loading plugins on Android and iOS — platform rules and practical patterns
Platform constraints are the most common stumbling block:
iOS: No downloadable native code — use JS/WASM or pre-approved extensions
Apple’s App Store policies still disallow downloading and executing arbitrary native code in 2026. To stay compliant while enabling plugins:
- Allow plugin JavaScript + WASM: load JS and WASM at runtime; host executes them in a sandboxed JS engine (WKWebView or JavaScriptCore).
- Pre-bundle optional native features: for high-trust plugins, include native frameworks inside the app and map to plugin IDs (review risk and App Store policy).
- Use app extensions for OS-level integration where necessary, but they must be bundled in the app.
iOS example: communicating to the model runtime (Swift)
class PluginBridge: NSObject, WKScriptMessageHandler {
func userContentController(_ uc: WKUserContentController, didReceive message: WKScriptMessage) {
guard let body = message.body as? [String:Any] else { return }
let route = body["route"] as? String
// Forward to local HTTP model API running inside the app
}
}
Android: more flexible, but secure bridging is key
Android allows more flexibility: you can ship dynamic feature modules or download plugin bundles, but you must still enforce sandboxing and signature checks.
Android example: addJavascriptInterface (Kotlin)
class PluginBridge(private val context: Context) {
@JavascriptInterface
fun callModel(jsonRequest: String): String {
// Forward request to local model runtime over loopback HTTP
return "{'status':'queued'}"
}
}
webView.addJavascriptInterface(PluginBridge(this), "PluginBridge")
Be aware: addJavascriptInterface has security pitfalls. Only expose tightly-scoped methods and validate inputs. Prefer message-passing APIs that keep binary data off the JS bridge.
UI surface patterns for mobile browser plugins
Design consistent, low-friction UI surfaces so plugins feel native and predictable. Prioritize quick, context-aware helpers over full-screen apps.
Common UX surfaces
- Toolbar button — one-tap entry to plugin actions. Ideal for single-purpose plugins (summarize, translate).
- Selection context menu — operate directly on highlighted text or media.
- Side sheet / overlay — ephemeral panel for richer interactions; should be dismissible and not cover essential content.
- Inline injection — small, contextual annotations inserted into the DOM (use sparingly and opt-in).
- Command palette / omnibox — quick keyboard-driven access to plugin commands (power users).
State and lifecycle guidelines
- Keep plugin state isolated and persistent only in plugin-scoped storage.
- Stop long-running work when the user navigates away.
- Show progress and cancel affordances for model inference (timeouts, abort controllers).
Security hardening and privacy best practices
Security is the core differentiator of local-AI browsers. Implement layered defenses:
- Signature verification: Require developer signatures for plugin bundles and verify at install.
- Capability tokens: Issue short-lived, scoped tokens for model and platform APIs.
- Audit logs: Record plugin API calls locally; allow users to inspect plugin activities.
- Network controls: By default, disallow outbound network from plugins that have model.infer; require explicit capability for network access (and disclose it).
- Resource quotas: Enforce memory, inference time and concurrency limits; throttle or sandbox heavy models.
- Privacy disclosure: Show a concise explanation of what data is used locally, what (if anything) is sent off-device, and how to opt out.
Packaging & distribution strategies
There are three main distribution models to consider:
- In-app marketplace: The safest and most controlled. Host signed plugin bundles in your own marketplace and vet developers. Works well for iOS App Store compliance.
- Direct install via bundle: Allow users to install zip bundles from disk or a URL. Must still verify signatures and show permission prompts.
- Enterprise or developer mode: For third-party developers and power users, support a developer mode with explicit warnings and additional logging.
Android-specific options: ship plugins as Dynamic Feature Modules (DFM) for tighter integration, or keep them as JS/WASM zip bundles stored in the app's private storage.
Model packaging and updates
Models are large and user expectations for privacy and offline usage are high. Design a model lifecycle:
- Ship a small default model inside the app for instant use.
- Allow optional model downloads (user consent, show disk cost).
- Support model variant selection (tiny/fast vs high-quality) with on-device conversion pipelines.
- Use delta updates and chunked downloads to minimize bandwidth.
Model formats to support in 2026:
- GGML / Q* quantized binaries — extremely common for local LLMs.
- Core ML (.mlmodelc) — best for Apple devices using NN accelerators.
- TFLite / NNAPI — cross-platform mobile inference.
- WASM runtimes — safe to execute on iOS as they don’t violate code-execution rules.
Developer experience: SDKs, docs, and testing
To attract builders, provide:
- Lightweight JS SDK that abstracts message passing and model API calls.
- Native helper libraries for Android (Kotlin) and iOS (Swift) that wrap permission and capability flows.
- Local test harness that runs the host and a mock model runtime so plugins can be tested without a physical device.
- Policy guide explaining allowed behaviors, privacy rules, and UI templates for permission prompts.
Example JS SDK usage
// index.js inside plugin bundle
const host = window.BrowserHost; // injected host SDK
async function summarizeSelection() {
const selection = await host.tab.readSelection();
const resp = await host.model.complete({ model: 'puma-7b-q4', input: selection, stream: true });
resp.on('data', chunk => updateUI(chunk));
}
Operational recommendations and monitoring
Monitor the ecosystem without compromising privacy:
- Collect only anonymized metrics (crash rates, plugin install counts, inference latency).
- Allow opt-in telemetry for developers to debug issues.
- Maintain an automated vulnerability scanning pipeline for uploaded plugin bundles (WASM/JS static analysis).
Future-proofing and trends to watch
Plan for these 2026+ trends:
- Model shards & streaming load: on-device models will be streamed or sharded to balance storage and latency.
- Federated personalization: plugins may request localized personalization weights — support safe, opt-in federated update APIs.
- Hardware abstractions: expect more uniform NPU APIs across vendors; design runtime abstraction layers.
- WASM + WebGPU: Running inference via WASM accelerated by WebGPU becomes a viable cross-platform option.
Quick checklist to get started (actionable)
- Define your plugin manifest and capability model.
- Implement the local model runtime with a stable HTTP JSON API and streaming support.
- Build a JS SDK to expose host APIs to plugins and hide platform differences.
- Implement signature verification and per-plugin capability tokens in the host.
- Create a default set of curated models and a safe model-download flow with quotas.
- Design permission UX: contextual prompts, settings page, and audit logs.
- Test on device: Android (DFM, WebView) and iOS (WKWebView, JavaScriptCore/WASM). Validate App Store compliance for iOS workflows.
Rule of thumb: prefer JavaScript/WASM plugins for maximum portability and App Store compliance; allow native features only when pre-bundled and reviewed.
Case study: SummarizeThis (minimal viable plugin)
Implementation notes for a quick demonstration:
- Packaging: zip with manifest.json, index.js, icon.png.
- Capabilities: tab.read, model.infer.
- UI: toolbar button + side sheet that streams summaries using SSE.
- Security: signed bundle + host issues a one-hour capability token scoped to model.infer.
Developer flow: plugin calls host.tab.readSelection(); host returns selection; plugin calls host.model.complete(); host forwards to the runtime and streams results back. All operations logged locally.
Conclusion and next steps
Building a plugin ecosystem for a Puma-style local-AI browser is achievable in 2026 if you design for platform constraints, privacy, and developer ergonomics from day one. The architecture outlined here—separate model runtime, capability tokens, signed bundles, JS/WASM-first plugins, and a clear permission UX—gives you a practical roadmap to ship quickly while minimizing risk.
Actionable next step: prototype a minimal host + runtime that serves /v1/complete on loopback, then build a tiny JS plugin that uses only tab.read and model.infer. Test on an Android device and an iPhone to validate bridging patterns and UX flows.
Want a starter repo, manifest templates, or SDK snippets tuned for Puma-like browsers? Reach out on thecode.website or check the developer docs to download the starter kit and join our developer preview program.
Related Reading
- Top 5 Compact Chargers for Shared Households and Multi-Device Families
- Lahore Homebuyer Benefits: Banking, Credit Union Perks, and How to Save on Closing Costs
- Halftime Choreography: Teaching Your Squad the BTS Arirang Hook for Game Breaks
- How to Choose Premium Beverage Syrups Without Breaking Your Margin
- West Ham on the Big Screen: Pitching Club Documentaries and Fan Films Inspired by EO Media’s Slate Moves