Skip to Content
Design DecisionsTwo External Apps, Internal Agent

Two external apps per solution; the agent is internal-only

Status: Accepted · Date: 2026-06-30 (Q4 resolved), revised 2026-07-01 (same-origin proxy draft reverted) and 2026-07-03 (agent made internal-only) · Area: Deployment

Context

A deployed solution stack has three user-facing surfaces: the read-only reader webapp, the writer agent (an HTTP service), and the Mission Control console for the agent. Two questions had to settle: which apps are externally reachable, and how apps find each other when the images are shared across all solutions (ADR-002) — so a per-solution URL cannot be baked into an image, because Next.js inlines NEXT_PUBLIC_* variables at build time. An initial draft made the agent internal-only behind a same-origin proxy; an even earlier draft had considered internal ingress for the agent.

Decision (as of 2026-06-30, revised 2026-07-01)

  • All three apps are externalorbit-<h> (reader), agent-<h> (agent), mc-<h> (Mission Control, stateless, no share mount). The agent was public by design; the internal-ingress draft was judged wrong, and the same-origin proxy revision was reverted on 2026-07-01.
  • Each gets its own Cloudflare subdomain, provisioned per solution: <id>.<zone>, <id>-agent.<zone>, <id>-mc.<zone> — proxied CNAME, Full-strict SSL, and an ACA managed-cert custom-domain binding, via shared lib.sh helpers used by both the singleton and per-solution paths.
  • Mission Control reaches the agent via a runtime env var. mc-<h> receives the agent’s public URL as AGENT_URL at deploy time; the force-dynamic root layout injects window.__AGENT_URL__ for the browser and the server reads process.env.AGENT_URL. Why not bake: per-solution images would break ADR-002. Why not proxy (the 2026-07-01 reasoning): hiding the agent behind the webapp was held to be incompatible with later securing the agent’s own subdomain with WorkOS.
  • The reader’s link out to Mission Control is also runtime-wired, but differently — it is a link, not browser API calls: the reader gets AGENT_WEBAPP_URL and a single force-dynamic route handler (/mission-control) issues a 302 at runtime, avoiding force-dynamic on the whole reader.
  • The apps are public by default, with an optional edge-secret lockdown behind edge.lock_solutions=true: dormant enforcement primitives (Next src/proxy.ts in the two webapps, a Hono edgeGuard middleware in the agent, /health exempt) activated by EDGE_SHARED_SECRET plus one per-solution Cloudflare Transform Rule over the external hosts.

Revised 2026-07-03: the agent is internal-only

The 2026-07-01 position — “the agent is public by design” — is superseded. Nothing outside the ACA environment ever needed the agent: its only callers are the reader’s server side and Mission Control. Keeping a public subdomain meant keeping an entire exposed surface (and the follow-on plan of putting WorkOS in front of it) for no consumer. The revised decision:

  • The agent’s ACA ingress is internal (external: false). Requests from outside the environment are rejected by the environment proxy (404). The agent has no Cloudflare subdomain<id>-agent.<zone> no longer exists for new stacks (teardown still cleans up legacy DNS records).
  • In-env callers address the agent as http://agent-<h> — the app name resolves via the environment’s internal DNS and hits the ingress on :80. The .internal.<domain> FQDN form does not resolve on this environment — the same pattern as the internal registry services (svc-*).
  • The reader needed no code change: it already talked to the agent server-side only (ORBIT_AGENT_URL); that env var now carries the in-env address.
  • Mission Control’s browser can no longer call the agent directly, so the same-origin proxy returns — this time as the accepted mechanism. Mission Control gains a streaming proxy route (src/app/agent/[...path]/route.ts); the root layout now injects the literal prefix /agent as window.__AGENT_URL__ whenever AGENT_URL is set, and the proxy forwards fetch and SSE to the agent’s in-env address (AGENT_URL env = http://agent-<h>). Local dev is unchanged: the browser hits the agent directly via NEXT_PUBLIC_AGENT_URL.
  • The scale-to-zero keep-alive follows the address change: AGENT_SELF_URL now long-polls http://agent-<h>; the request still passes the (internal) ingress, so ACA’s HTTP scale rule still sees the app busy (ADR-004).
  • Edge lockdown shrinks to the two public apps: the per-solution Transform Rule covers 2 hosts, the agent needs no edge secret, and the reader→agent bypass header (x-orbit-bypass / EDGE_BYPASS_SECRET) is obsolete — the config keys edge.bypass_header_name / edge.bypass_secret were removed.
  • Aurora’s registry deployment record no longer stores an agentUrl, and the portfolio UI no longer renders an “agent” link (legacy records may still carry the field).

There is a conscious irony here: the original 2026-07-01 draft’s same-origin proxy was rejected then and is the mechanism now. What changed is the constraint, not the argument — the proxy was rejected because it conflicted with WorkOS-protecting the agent’s own subdomain, and the agent no longer has a subdomain to protect. The runtime-URL injection machinery (ADR-002’s no-baking rule) survives unchanged; only the injected value differs (/agent instead of a public URL).

Consequences

  • Origin isolation per public app and per solution, matching the platform’s URL convention (flat one-level subdomains under maxqlabs-orbit.com for the wildcard cert), and a clean path to putting WorkOS user auth in front of each public subdomain — the recorded follow-on, which now concerns only the reader and Mission Control.
  • The agent’s entire HTTP surface is unreachable from outside the ACA environment — one whole exposed surface removed, with no edge secret or WorkOS front needed for it.
  • Edge lockdown blocks direct-origin access to the two public apps’ raw ACA FQDNs but is not user authentication; until WorkOS lands, unlocked solutions are public.
  • Turning lockdown on requires cloudflare.zone_name (without the header injector every app would 403); the provisioner refuses otherwise. With the flag off, empty secrets are pruned so behavior is exactly pre-lockdown.
  • One naming scar is load-bearing: env-storage link names put the role word first (cust-rw-<h>, never <h>-cust-rw) because Azure requires an alphabetic first character and the hash often starts with a digit.

Evidence

  • infrastructure/azure/provision-solution.sh + app-orbit.yaml.tmpl, app-agent.yaml.tmpl (internal ingress), app-agent-webapp.yaml.tmpl
  • infrastructure/azure/lib.sh — shared Cloudflare/ACA custom-domain helpers and reconcile_transform_rule_hosts (2 hosts per solution)
  • implementation/maxq-orbit-agent-webappsrc/app/layout.tsx (injects /agent as window.__AGENT_URL__), src/app/agent/[...path]/route.ts (the same-origin streaming proxy), src/lib/agent-client.ts
  • implementation/orbit-webapp/codebase/src/app/mission-control/route.ts — the runtime redirect
  • memory/orbit-auto-deploy.md (the three-apps decision and its revisions), memory/orbit-platform-landscape.md (URL convention)