Aurora

Aurora (implementation/aurora-webapp) is the management application above the Trajectory solutions: the portfolio / control-plane layer that manages, monitors, and orchestrates the whole fleet of Orbit solutions across customers and tenants. Where each per-solution Orbit stack (Orbit Webapp, Orbit Agent, Mission Control) is scoped to one solution at a time, Aurora operates across many:

a live GitHub data layer that discovers every solution in the fleet,
the portfolio registry BFF over the Customer → Tenant → Solution registry services,
solution creation (forking the two template repos and seeding solution.yaml),
the Azure auto-deploy trigger that stands up a per-solution Orbit stack on Azure Container Apps, and
an in-product agent chat panel built on the Claude Agent SDK.

A key structural difference from the Orbit apps: Aurora has no solution repo mount at all. Orbit loads one solution tree from disk; Aurora reads everything live — GitHub for solution definitions, the registry services for portfolio records, and Azure APIs for deployment.

Tech stack

Concern	Choice
Framework	Next.js 16 (App Router), React 19, TypeScript 5
Styling	Tailwind CSS 4 + Orbit’s `globals.css` design system (copied verbatim, with a marked `/* === Aurora additions === */` block)
GitHub access	`@octokit/rest` + `@octokit/auth-app` (GitHub App JWT → installation token)
Azure access	`@azure/arm-appcontainers`, `@azure/arm-storage`, `@azure/identity` (`DefaultAzureCredential` — managed identity, no service-principal secret)
Agent chat	`@anthropic-ai/claude-agent-sdk`
Validation / parsing	`zod`, `yaml`

Dependencies are deliberately trimmed compared to orbit-webapp — no mermaid, d3, Monaco, xyflow, or MDX pipeline. The shell mirrors Orbit’s (aurora-app.tsx, rail.tsx, header.tsx, a trimmed icon set) so the control plane and the per-solution viewers feel like one product family.


src/
  app/            layout, / (portfolio or tenants home), /customers, /solutions,
                  /tenants/[id], plus /api/* route handlers
  components/
    aurora/       shell, rail, header, portfolio-view, solution-card,
                  tenants-home, tenant-detail, customers-view, solutions-ops,
                  chat/ (agent-dock, use-agent-chat)
  lib/
    portfolio/    github.ts (App auth), loader.ts (two-org discovery),
                  azure.ts (Azure config + credential)
    registry/     client.ts, http.ts, join.ts, write-back.ts, types.ts
    solutions/    naming.ts, create-solution.ts, deploy-azure.ts (adapter over
                  the @maxq/orbit-deploy engine)
    agent/        config.ts, run.ts, tools.ts, registry-tools.ts, deploy-tools.ts

The live GitHub data layer

Aurora discovers the fleet by querying two GitHub organisations through two GitHub Apps — one App per org — in src/lib/portfolio/. Each App authenticates with @octokit/auth-app (App JWT exchanged for a cached installation token) and needs only Contents: read and Metadata: read, with no webhook.

Org	Role	What Aurora reads
`maxq-labs-orbit-solutions` (customer)	Source of truth	Lists `solution-*` repos (excluding `solution-template`) and reads each repo’s `workspace/solution-definition/solution.yaml` via the Contents API — one portfolio entry per solution
`maxq-labs-orbit-platform` (internal)	Correlation only	Lists `solution-*-internal` repos (excluding `solution-template-internal`) to mark whether each solution’s internal counterpart exists (“internal repo linked” indicator). The internal repo has no `solution.yaml` of its own

The customer repo’s solution.yaml is the authoritative metadata for each solution (id, name, owners, description, per-stage status). loader.ts applies a 60-second module-level TTL cache, and the pages are force-dynamic, so a burst of renders triggers a single GitHub fetch and data is never more than a minute stale.

Configuration

Each org is configured through environment variables (values live in a git-ignored .env; .env.example documents the setup): per role (CUSTOMER or INTERNAL) — GH_<ROLE>_ORG, GH_<ROLE>_APP_ID, GH_<ROLE>_INSTALLATION_ID (optional; auto-discovered via the App’s org installation), and the private key either inline (GH_<ROLE>_PRIVATE_KEY) or as a path (GH_<ROLE>_PRIVATE_KEY_PATH) to a .pem file under the git-ignored secrets/ directory. readOrgConfig() returns null when an org is unconfigured, in which case the app renders a “connect Aurora to GitHub” empty state instead of crashing.

Private keys are never stored in any .env or committed anywhere — they are .pem files mounted read-only (locally at /secrets) or injected inline through the environment / ACA secrets.

The registry BFF

Aurora is the backend-for-frontend of the portfolio registry: browsers and downstream consumers never talk to the three registry services directly. src/lib/registry/client.ts is the only module that calls them (service URLs from CUSTOMER_SERVICE_URL / TENANT_SERVICE_URL / SOLUTION_SERVICE_URL), forwarding an x-actor header, an optional x-service-secret, and If-Match versions for optimistic concurrency.

The feature gate is registryEnabled(): AURORA_TENANTS_ENABLED === "true" and all three service URLs configured. With the flag off, Aurora behaves exactly like the original portfolio app.

The join and the sweep

Registry records and GitHub repos are correlated in src/lib/registry/join.ts:

Join key = the customer-repo name (not the solution id). Each registry solution record carries repos.customer.name; the join looks that name up in the portfolio org-scan snapshot. This tolerates legacy drift — a registry id may differ from what an older solution.yaml says its id is. Authority is split per field: the registry record owns portfolio concerns (tenant, deployment); solution.yaml owns the methodology definition.
The reconciliation sweep compares the GitHub org scan with the registry and reports two healable drift states: unregistered repo pairs (on GitHub, no record) and orphaned records (record exists, repo gone). Nothing silently disappears — both lists surface in the operator UI and via GET /api/solutions?sweep=true.

API routes

All handlers are nodejs runtime, force-dynamic, and return 503 with a pointer to the flag when the registry is disabled:

Route	Methods	Purpose
`/api/customers`	GET, POST	List (optional `?status=`) / create customers
`/api/customers/[id]`	GET, PATCH, DELETE	Read / update / delete one customer
`/api/tenants`	GET, POST	List (`?customer=`, `?status=`) / create tenants
`/api/tenants/[id]`	GET, PATCH, DELETE	Read / update / delete one tenant
`/api/tenants/[id]/solutions`	GET, POST	The tenant’s joined solution list (the contract the tenant-scoped Orbit reader consumes); POST `{ solutionId }` attaches a solution — refused with 409 if it is already attached to another tenant
`/api/tenants/[id]/solutions/[solutionId]`	DELETE	Detach (clears the record’s `tenant` pointer; the record becomes unassigned)
`/api/solutions`	GET, POST	Joined records with filters (`?tenant=`, `?customer=`, `?status=`, `?unassigned=`); `?sweep=true` adds the unregistered/orphaned drift sections; POST registers a record (creation flow and backfill)
`/api/solutions/[id]`	GET, PATCH	Read / correct one record (deployment corrections go through PATCH)
`/api/agent`	POST	The agent chat SSE endpoint (see below)

UI surface

Aurora renders in one of two modes, selected by the registry gate.

Flag off — the portfolio view

The original single-page app: a hero with roll-up mini-cards and a .sol-grid of solution cards, one per customer repo. Each card shows a status chip, a per-stage As-Is / R2B / To-Be strip, links to both repos, and the internal-repo-linked indicator.

Flag on — the tenants repivot (`AURORA_TENANTS_ENABLED`)

With the registry enabled, the UI repivots from a flat solution grid to the Customer → Tenant → Solution model, using real routes instead of a client-side view switch. The shared chrome (AuroraShell) keeps the same rail, header, and agent dock, with rail links to Overview (/), Customers (/customers), and Solutions (/solutions):

Route	View
`/`	Tenants home — customers, tenants, joined solutions, and the drift sweep in one overview; falls back to a “registry unreachable” surface if the services do not respond
`/customers`	Customer list with per-customer tenant counts
`/tenants/[id]`	Tenant detail — the tenant’s joined solutions plus attach candidates (only unassigned solutions of the same customer are offered, enforcing the cross-customer invariant)
`/solutions`	The demoted all-solutions operator view: the old portfolio grid extended with registry columns (tenant, deployment state) plus the sweep’s attach/register surfaces

Solution creation

Aurora can stand up a brand-new Trajectory solution. A solution is two GitHub repos, forked from the base templates (both templates live in the platform org):

Repo	Forked from	Into	Name
Customer repo	`solution-template`	`maxq-labs-orbit-solutions`	`solution-<org>-<sol>` (private)
Internal repo	`solution-template-internal`	`maxq-labs-orbit-platform`	`solution-<org>-<sol>-internal` (private)

Key mechanics (in src/lib/solutions/create-solution.ts and naming.ts):

Native octokit, not subprocess. The original create-solution skill’s bash worker was ported to a server-side service — the app never shells out to gh, git, or python. Seeding solution.yaml uses the Contents API (get → line-replace the four identity keys, preserving comments → update), not clone/commit/push, so the container needs no CLI tooling.
True fork with upstream link. Repos are real GitHub forks (renamed on fork), deliberately keeping the “forked from” link so each solution can pull future template and methodology updates via GitHub’s Sync fork.
Naming rules. Inputs are an organization slug and a solution slug: each letters and hyphens only, at most 20 characters, no leading/trailing hyphen, auto-lowercased. solution.yaml is seeded with id=<org>-<sol>, name=<sol>, owners=["<org>"], and the optional description.
Preview before create. previewSolution validates the names and checks GitHub that neither target repo exists — a dry run that makes no changes.
Cross-org fork credential. GitHub App installation tokens are per-org and cannot fork a private repo across orgs (proven empirically with a spike script, scripts/spike-cross-org-fork.mjs, including the App-installed-on-both-orgs variant). The customer fork therefore uses a classic PAT from an account that is a member of both orgs, supplied as GH_PROVISION_TOKEN and used only for that one fork; the App tokens still handle uniqueness checks, the same-org internal fork, and seeding. Fine-grained PATs are single-org, so the token must be classic. previewSolution reports provisioningReady and the agent warns when the token is unset.
Registry integration. With the registry enabled, creation requires a tenant (validated against the tenant service before any fork is created; a tenant belonging to a different customer is refused), and the solution record is registered after seeding. Registration failure is non-fatal — the result carries a registerError and the register_solution tool can heal it later.

Azure auto-deploy (summary)

When ORBIT_DEPLOY_ENABLED=true, createSolution triggers deployOrbitStack() after seeding. Since 2026-07-05 that function is a thin adapter (src/lib/solutions/deploy-azure.ts) over the shared @maxq/orbit-deploy engine (implementation/shared/orbit-deploy — see ADR-015): it builds the engine config from Aurora’s environment, runs the deploy scenario with DefaultAzureCredential (Aurora’s managed identity on ACA), and persists the structured run log onto the registry record. The scenario provisions, per solution: an ACR image preflight (pinned tags must exist — building images stays a human release step), two Azure Files shares (customer + internal repo clones), the env-storage links, a conditional seed job (skipped when the shares already hold clones), three Container Apps — the reader orbit-<h>, the writer agent agent-<h> (internal ingress only since 2026-07-03), and Mission Control mc-<h> — from shared ACR images (h is a 12-hex-character hash of the solution id; naming lives in the engine, with a parity gate against the bash azure_names), and the Cloudflare front door + optional edge lockdown for the two public apps (previously bash-only — the in-product path produced front-door-less stacks). Deploys are idempotent and reconciling, safe to retry, and available standalone via the confirm-gated deploy_solution agent tool. On success, the deployment record — state, the two verified public URLs (no more zone string-guessing; no agentUrl since the agent went internal), deployedAt, and the per-step lastRun log — is written back to the registry (non-fatal on failure).

The full provisioning model — steps and scenarios, naming scheme, shares and mounts, Cloudflare subdomains, edge lockdown, scale rules — is covered on the Azure deployment page.

The agent chat panel

Aurora embeds a chat panel (the “agent dock”, mounted in the shell’s reserved agent row) that talks to the Claude Agent SDK and can operate the portfolio conversationally — including creating and deploying solutions.

Transport

POST /api/agent is a Node-runtime, force-dynamic route returning an SSE stream. The design is stateless: the client sends the full conversation history each turn, and prior turns are folded into the system prompt — no server-side session storage, so it works across multiple replicas.

Locked-down toolset

The agent’s capabilities are defined in src/lib/agent/tools.ts, registry-tools.ts, and deploy-tools.ts as an in-process MCP server (createSdkMcpServer, server name aurora) — it never shells out:

Tool	Effect	Gate
`preview_solution`	Dry-run name validation + uniqueness check	read-only
`create_solution`	Fork both repos, seed `solution.yaml`, register, auto-deploy	confirm-gated
`solution_status`	Live Azure snapshot of a stack (app states, image drift vs the pinned tags, front-door DNS, effective URLs) cross-checked against the registry record	read-only
`verify_solution`	Run the full step list in verify mode — read-only assertions with remediation hints	read-only
`plan_deploy`	Dry-run any scenario: what exists, what apply would change	read-only
`deployment_progress`	Read the persisted per-step run log (`deployment.lastRun`) from the registry	read-only
`deploy_diagnostics`	The real failed ARM operations from the Activity Log (ingestion lags ~2–5 min)	read-only
`deploy_solution`	The full `deploy` scenario: preflight → storage → seed → apps → front door → lockdown	confirm-gated
`redeploy_solution_apps`	Roll just the apps to the pinned images (optional per-component filter)	confirm-gated
`reconcile_front_door`	Just the Cloudflare front door + edge lockdown	confirm-gated
`reconcile_storage`	Just shares + links (+ optional reseed)	confirm-gated
`teardown_solution`	Destructive stack teardown	confirm + typed-back id (`confirmId`)
`list_customers`, `list_tenants`, `list_solutions`	Registry reads (`list_solutions` includes the drift sweep)	read-only
`create_customer`, `create_tenant`, `attach_solution`, `detach_solution`, `register_solution`	Registry mutations	confirm-gated
`Read`, `Grep`, `Glob`	Read-only workspace context (cwd from `AURORA_AGENT_CWD`)	read-only

The deploy tools resolve a solution by its registry record first (the repos are stored facts) and fall back to explicit org + name when the registry is off. Every deploy tool returns structured step results (status / detail / evidence / remediation as JSON) rather than prose — the recorded failure mode of prose-out tools is the model paraphrasing errors wrongly.

disallowedTools blocks Bash, Write, Edit, NotebookEdit, WebFetch, and WebSearch; Skill is deliberately not allowed and setting sources are omitted so the gh-based create-solution skill can never load inside the panel. maxTurns is capped at 16.

The confirmation gate

Approval is conversational, not an interactive button: a mid-stream approve/deny UI cannot be resolved statelessly across replicas. The flow is — the agent previews, shows the resolved plan, asks the user, and only after an explicit “confirm” calls the tool with confirm: true. Two layers enforce this:

Each side-effecting tool rejects any call where confirm !== true.
The SDK’s canUseTool callback is a hard guard (defense in depth): it denies every confirm-gated tool — creation, all deploy scenarios, and all registry mutations — unless the input carries confirm: true.

Teardown is double-gated: teardown_solution additionally requires a confirmId parameter that must exactly equal the solution id — the user has to literally type the id back in chat (the conversational analogue of deploy.sh delete’s retype prompt) — and canUseTool enforces that match too.

Enablement and credentials

The panel is off unless AURORA_AGENT_ENABLED=true and a Claude credential is present. Two credential modes are accepted: a billing API key (ANTHROPIC_API_KEY) or a Claude subscription token (CLAUDE_CODE_OAUTH_TOKEN, from claude setup-token) — the two are distinct env vars because the SDK rejects a subscription token passed as an API key. An optional AURORA_AGENT_MODEL overrides the model.

Integration map

Configuration reference

Variable	Purpose
`GH_CUSTOMER_ORG` / `GH_INTERNAL_ORG`	The two GitHub org names
`GH_<ROLE>_APP_ID`, `GH_<ROLE>_INSTALLATION_ID`	GitHub App identity per org (installation id optional — auto-discovered)
`GH_<ROLE>_PRIVATE_KEY` or `GH_<ROLE>_PRIVATE_KEY_PATH`	App private key, inline or as a `.pem` path
`GH_PROVISION_TOKEN`	Classic PAT for the cross-org customer fork (creation only)
`AURORA_TENANTS_ENABLED`	Enables the registry BFF + tenants UI repivot
`CUSTOMER_SERVICE_URL`, `TENANT_SERVICE_URL`, `SOLUTION_SERVICE_URL`	Registry service endpoints
`SERVICE_SHARED_SECRET`	Optional shared secret forwarded to the registry services
`ORBIT_DEPLOY_ENABLED`	Enables the Azure auto-deploy path
`AZURE_SUBSCRIPTION_ID`, `AZURE_RESOURCE_GROUP`, `AZURE_ACR_NAME`, `AZURE_ACA_ENV`, `AZURE_STORAGE_ACCOUNT`	Azure coordinates for the deploy path
`CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ZONE_NAME`, `CLOUDFLARE_PROXIED`, `CLOUDFLARE_SSL_MODE`	Lets the engine drive the per-solution front door in-product (both token and zone required; otherwise stacks serve on raw FQDNs)
`EDGE_LOCK_SOLUTIONS` (+ `EDGE_HEADER_NAME`, `EDGE_SHARED_SECRET`)	Extends the Cloudflare-only lockdown to each solution’s two public apps
`AGENT_GIT_TOKEN`, `AGENT_INTERNAL_GIT_TOKEN`	Per-solution agent push tokens (fall back to `GH_PROVISION_TOKEN`)
`AGENT_ANTHROPIC_API_KEY` or `AGENT_CLAUDE_CODE_OAUTH_TOKEN`	Per-solution agent Claude credential (falls back to Aurora’s own)
`AGENT_DEFAULT_MODEL`	Model override for provisioned solution agents (never rendered blank)
`ORBIT_WEBAPP_IMAGE_TAG`, `ORBIT_AGENT_IMAGE_TAG`, `ORBIT_AGENT_WEBAPP_IMAGE_TAG`	The image tags per-solution stacks pin (validated against ACR by the preflight step)
`AURORA_AGENT_ENABLED`	Enables the agent chat panel
`ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN`	Claude credential (either mode)
`AURORA_AGENT_CWD`	Working directory for the agent’s read-only Read/Grep/Glob tools
`AURORA_AGENT_MODEL`	Optional model override for the chat agent

Locally, Aurora runs in the shared docker-compose stack (see Local development); in production it is the singleton aurora Container App behind a Cloudflare front door (see Azure deployment).