Skip to Content

Aurora

Aurora (implementation/aurora-webapp) is the management application above the Trajectory solutions: the portfolio / control-plane layer that manages, monitors, and orchestrates the whole fleet of Orbit solutions across customers and tenants. Where each per-solution Orbit stack (Orbit Webapp, Orbit Agent, Mission Control) is scoped to one solution at a time, Aurora operates across many:

  • a live GitHub data layer that discovers every solution in the fleet,
  • the portfolio registry BFF over the Customer → Tenant → Solution registry services,
  • solution creation (forking the two template repos and seeding solution.yaml),
  • the Azure auto-deploy trigger that stands up a per-solution Orbit stack on Azure Container Apps, and
  • an in-product agent chat panel built on the Claude Agent SDK.

A key structural difference from the Orbit apps: Aurora has no solution repo mount at all. Orbit loads one solution tree from disk; Aurora reads everything live — GitHub for solution definitions, the registry services for portfolio records, and Azure APIs for deployment.

Tech stack

ConcernChoice
FrameworkNext.js 16 (App Router), React 19, TypeScript 5
StylingTailwind CSS 4 + Orbit’s globals.css design system (copied verbatim, with a marked /* === Aurora additions === */ block)
GitHub access@octokit/rest + @octokit/auth-app (GitHub App JWT → installation token)
Azure access@azure/arm-appcontainers, @azure/arm-storage, @azure/identity (DefaultAzureCredential — managed identity, no service-principal secret)
Agent chat@anthropic-ai/claude-agent-sdk
Validation / parsingzod, yaml

Dependencies are deliberately trimmed compared to orbit-webapp — no mermaid, d3, Monaco, xyflow, or MDX pipeline. The shell mirrors Orbit’s (aurora-app.tsx, rail.tsx, header.tsx, a trimmed icon set) so the control plane and the per-solution viewers feel like one product family.

src/ app/ layout, / (portfolio or tenants home), /customers, /solutions, /tenants/[id], plus /api/* route handlers components/ aurora/ shell, rail, header, portfolio-view, solution-card, tenants-home, tenant-detail, customers-view, solutions-ops, chat/ (agent-dock, use-agent-chat) lib/ portfolio/ github.ts (App auth), loader.ts (two-org discovery), azure.ts (Azure config + credential) registry/ client.ts, http.ts, join.ts, write-back.ts, types.ts solutions/ naming.ts, create-solution.ts, deploy-azure.ts (adapter over the @maxq/orbit-deploy engine) agent/ config.ts, run.ts, tools.ts, registry-tools.ts, deploy-tools.ts

The live GitHub data layer

Aurora discovers the fleet by querying two GitHub organisations through two GitHub Apps — one App per org — in src/lib/portfolio/. Each App authenticates with @octokit/auth-app (App JWT exchanged for a cached installation token) and needs only Contents: read and Metadata: read, with no webhook.

OrgRoleWhat Aurora reads
maxq-labs-orbit-solutions (customer)Source of truthLists solution-* repos (excluding solution-template) and reads each repo’s workspace/solution-definition/solution.yaml via the Contents API — one portfolio entry per solution
maxq-labs-orbit-platform (internal)Correlation onlyLists solution-*-internal repos (excluding solution-template-internal) to mark whether each solution’s internal counterpart exists (“internal repo linked” indicator). The internal repo has no solution.yaml of its own

The customer repo’s solution.yaml is the authoritative metadata for each solution (id, name, owners, description, per-stage status). loader.ts applies a 60-second module-level TTL cache, and the pages are force-dynamic, so a burst of renders triggers a single GitHub fetch and data is never more than a minute stale.

Configuration

Each org is configured through environment variables (values live in a git-ignored .env; .env.example documents the setup): per role (CUSTOMER or INTERNAL) — GH_<ROLE>_ORG, GH_<ROLE>_APP_ID, GH_<ROLE>_INSTALLATION_ID (optional; auto-discovered via the App’s org installation), and the private key either inline (GH_<ROLE>_PRIVATE_KEY) or as a path (GH_<ROLE>_PRIVATE_KEY_PATH) to a .pem file under the git-ignored secrets/ directory. readOrgConfig() returns null when an org is unconfigured, in which case the app renders a “connect Aurora to GitHub” empty state instead of crashing.

Private keys are never stored in any .env or committed anywhere — they are .pem files mounted read-only (locally at /secrets) or injected inline through the environment / ACA secrets.

The registry BFF

Aurora is the backend-for-frontend of the portfolio registry: browsers and downstream consumers never talk to the three registry services directly. src/lib/registry/client.ts is the only module that calls them (service URLs from CUSTOMER_SERVICE_URL / TENANT_SERVICE_URL / SOLUTION_SERVICE_URL), forwarding an x-actor header, an optional x-service-secret, and If-Match versions for optimistic concurrency.

The feature gate is registryEnabled(): AURORA_TENANTS_ENABLED === "true" and all three service URLs configured. With the flag off, Aurora behaves exactly like the original portfolio app.

The join and the sweep

Registry records and GitHub repos are correlated in src/lib/registry/join.ts:

  • Join key = the customer-repo name (not the solution id). Each registry solution record carries repos.customer.name; the join looks that name up in the portfolio org-scan snapshot. This tolerates legacy drift — a registry id may differ from what an older solution.yaml says its id is. Authority is split per field: the registry record owns portfolio concerns (tenant, deployment); solution.yaml owns the methodology definition.
  • The reconciliation sweep compares the GitHub org scan with the registry and reports two healable drift states: unregistered repo pairs (on GitHub, no record) and orphaned records (record exists, repo gone). Nothing silently disappears — both lists surface in the operator UI and via GET /api/solutions?sweep=true.

API routes

All handlers are nodejs runtime, force-dynamic, and return 503 with a pointer to the flag when the registry is disabled:

RouteMethodsPurpose
/api/customersGET, POSTList (optional ?status=) / create customers
/api/customers/[id]GET, PATCH, DELETERead / update / delete one customer
/api/tenantsGET, POSTList (?customer=, ?status=) / create tenants
/api/tenants/[id]GET, PATCH, DELETERead / update / delete one tenant
/api/tenants/[id]/solutionsGET, POSTThe tenant’s joined solution list (the contract the tenant-scoped Orbit reader consumes); POST { solutionId } attaches a solution — refused with 409 if it is already attached to another tenant
/api/tenants/[id]/solutions/[solutionId]DELETEDetach (clears the record’s tenant pointer; the record becomes unassigned)
/api/solutionsGET, POSTJoined records with filters (?tenant=, ?customer=, ?status=, ?unassigned=); ?sweep=true adds the unregistered/orphaned drift sections; POST registers a record (creation flow and backfill)
/api/solutions/[id]GET, PATCHRead / correct one record (deployment corrections go through PATCH)
/api/agentPOSTThe agent chat SSE endpoint (see below)

UI surface

Aurora renders in one of two modes, selected by the registry gate.

Flag off — the portfolio view

The original single-page app: a hero with roll-up mini-cards and a .sol-grid of solution cards, one per customer repo. Each card shows a status chip, a per-stage As-Is / R2B / To-Be strip, links to both repos, and the internal-repo-linked indicator.

Flag on — the tenants repivot (AURORA_TENANTS_ENABLED)

With the registry enabled, the UI repivots from a flat solution grid to the Customer → Tenant → Solution model, using real routes instead of a client-side view switch. The shared chrome (AuroraShell) keeps the same rail, header, and agent dock, with rail links to Overview (/), Customers (/customers), and Solutions (/solutions):

RouteView
/Tenants home — customers, tenants, joined solutions, and the drift sweep in one overview; falls back to a “registry unreachable” surface if the services do not respond
/customersCustomer list with per-customer tenant counts
/tenants/[id]Tenant detail — the tenant’s joined solutions plus attach candidates (only unassigned solutions of the same customer are offered, enforcing the cross-customer invariant)
/solutionsThe demoted all-solutions operator view: the old portfolio grid extended with registry columns (tenant, deployment state) plus the sweep’s attach/register surfaces

Solution creation

Aurora can stand up a brand-new Trajectory solution. A solution is two GitHub repos, forked from the base templates (both templates live in the platform org):

RepoForked fromIntoName
Customer reposolution-templatemaxq-labs-orbit-solutionssolution-<org>-<sol> (private)
Internal reposolution-template-internalmaxq-labs-orbit-platformsolution-<org>-<sol>-internal (private)

Key mechanics (in src/lib/solutions/create-solution.ts and naming.ts):

  • Native octokit, not subprocess. The original create-solution skill’s bash worker was ported to a server-side service — the app never shells out to gh, git, or python. Seeding solution.yaml uses the Contents API (get → line-replace the four identity keys, preserving comments → update), not clone/commit/push, so the container needs no CLI tooling.
  • True fork with upstream link. Repos are real GitHub forks (renamed on fork), deliberately keeping the “forked from” link so each solution can pull future template and methodology updates via GitHub’s Sync fork.
  • Naming rules. Inputs are an organization slug and a solution slug: each letters and hyphens only, at most 20 characters, no leading/trailing hyphen, auto-lowercased. solution.yaml is seeded with id=<org>-<sol>, name=<sol>, owners=["<org>"], and the optional description.
  • Preview before create. previewSolution validates the names and checks GitHub that neither target repo exists — a dry run that makes no changes.
  • Cross-org fork credential. GitHub App installation tokens are per-org and cannot fork a private repo across orgs (proven empirically with a spike script, scripts/spike-cross-org-fork.mjs, including the App-installed-on-both-orgs variant). The customer fork therefore uses a classic PAT from an account that is a member of both orgs, supplied as GH_PROVISION_TOKEN and used only for that one fork; the App tokens still handle uniqueness checks, the same-org internal fork, and seeding. Fine-grained PATs are single-org, so the token must be classic. previewSolution reports provisioningReady and the agent warns when the token is unset.
  • Registry integration. With the registry enabled, creation requires a tenant (validated against the tenant service before any fork is created; a tenant belonging to a different customer is refused), and the solution record is registered after seeding. Registration failure is non-fatal — the result carries a registerError and the register_solution tool can heal it later.

Azure auto-deploy (summary)

When ORBIT_DEPLOY_ENABLED=true, createSolution triggers deployOrbitStack() after seeding. Since 2026-07-05 that function is a thin adapter (src/lib/solutions/deploy-azure.ts) over the shared @maxq/orbit-deploy engine (implementation/shared/orbit-deploy — see ADR-015): it builds the engine config from Aurora’s environment, runs the deploy scenario with DefaultAzureCredential (Aurora’s managed identity on ACA), and persists the structured run log onto the registry record. The scenario provisions, per solution: an ACR image preflight (pinned tags must exist — building images stays a human release step), two Azure Files shares (customer + internal repo clones), the env-storage links, a conditional seed job (skipped when the shares already hold clones), three Container Apps — the reader orbit-<h>, the writer agent agent-<h> (internal ingress only since 2026-07-03), and Mission Control mc-<h> — from shared ACR images (h is a 12-hex-character hash of the solution id; naming lives in the engine, with a parity gate against the bash azure_names), and the Cloudflare front door + optional edge lockdown for the two public apps (previously bash-only — the in-product path produced front-door-less stacks). Deploys are idempotent and reconciling, safe to retry, and available standalone via the confirm-gated deploy_solution agent tool. On success, the deployment record — state, the two verified public URLs (no more zone string-guessing; no agentUrl since the agent went internal), deployedAt, and the per-step lastRun log — is written back to the registry (non-fatal on failure).

The full provisioning model — steps and scenarios, naming scheme, shares and mounts, Cloudflare subdomains, edge lockdown, scale rules — is covered on the Azure deployment page.

The agent chat panel

Aurora embeds a chat panel (the “agent dock”, mounted in the shell’s reserved agent row) that talks to the Claude Agent SDK and can operate the portfolio conversationally — including creating and deploying solutions.

Transport

POST /api/agent is a Node-runtime, force-dynamic route returning an SSE stream. The design is stateless: the client sends the full conversation history each turn, and prior turns are folded into the system prompt — no server-side session storage, so it works across multiple replicas.

Locked-down toolset

The agent’s capabilities are defined in src/lib/agent/tools.ts, registry-tools.ts, and deploy-tools.ts as an in-process MCP server (createSdkMcpServer, server name aurora) — it never shells out:

ToolEffectGate
preview_solutionDry-run name validation + uniqueness checkread-only
create_solutionFork both repos, seed solution.yaml, register, auto-deployconfirm-gated
solution_statusLive Azure snapshot of a stack (app states, image drift vs the pinned tags, front-door DNS, effective URLs) cross-checked against the registry recordread-only
verify_solutionRun the full step list in verify mode — read-only assertions with remediation hintsread-only
plan_deployDry-run any scenario: what exists, what apply would changeread-only
deployment_progressRead the persisted per-step run log (deployment.lastRun) from the registryread-only
deploy_diagnosticsThe real failed ARM operations from the Activity Log (ingestion lags ~2–5 min)read-only
deploy_solutionThe full deploy scenario: preflight → storage → seed → apps → front door → lockdownconfirm-gated
redeploy_solution_appsRoll just the apps to the pinned images (optional per-component filter)confirm-gated
reconcile_front_doorJust the Cloudflare front door + edge lockdownconfirm-gated
reconcile_storageJust shares + links (+ optional reseed)confirm-gated
teardown_solutionDestructive stack teardownconfirm + typed-back id (confirmId)
list_customers, list_tenants, list_solutionsRegistry reads (list_solutions includes the drift sweep)read-only
create_customer, create_tenant, attach_solution, detach_solution, register_solutionRegistry mutationsconfirm-gated
Read, Grep, GlobRead-only workspace context (cwd from AURORA_AGENT_CWD)read-only

The deploy tools resolve a solution by its registry record first (the repos are stored facts) and fall back to explicit org + name when the registry is off. Every deploy tool returns structured step results (status / detail / evidence / remediation as JSON) rather than prose — the recorded failure mode of prose-out tools is the model paraphrasing errors wrongly.

disallowedTools blocks Bash, Write, Edit, NotebookEdit, WebFetch, and WebSearch; Skill is deliberately not allowed and setting sources are omitted so the gh-based create-solution skill can never load inside the panel. maxTurns is capped at 16.

The confirmation gate

Approval is conversational, not an interactive button: a mid-stream approve/deny UI cannot be resolved statelessly across replicas. The flow is — the agent previews, shows the resolved plan, asks the user, and only after an explicit “confirm” calls the tool with confirm: true. Two layers enforce this:

  1. Each side-effecting tool rejects any call where confirm !== true.
  2. The SDK’s canUseTool callback is a hard guard (defense in depth): it denies every confirm-gated tool — creation, all deploy scenarios, and all registry mutations — unless the input carries confirm: true.

Teardown is double-gated: teardown_solution additionally requires a confirmId parameter that must exactly equal the solution id — the user has to literally type the id back in chat (the conversational analogue of deploy.sh delete’s retype prompt) — and canUseTool enforces that match too.

Enablement and credentials

The panel is off unless AURORA_AGENT_ENABLED=true and a Claude credential is present. Two credential modes are accepted: a billing API key (ANTHROPIC_API_KEY) or a Claude subscription token (CLAUDE_CODE_OAUTH_TOKEN, from claude setup-token) — the two are distinct env vars because the SDK rejects a subscription token passed as an API key. An optional AURORA_AGENT_MODEL overrides the model.

Integration map

Configuration reference

VariablePurpose
GH_CUSTOMER_ORG / GH_INTERNAL_ORGThe two GitHub org names
GH_<ROLE>_APP_ID, GH_<ROLE>_INSTALLATION_IDGitHub App identity per org (installation id optional — auto-discovered)
GH_<ROLE>_PRIVATE_KEY or GH_<ROLE>_PRIVATE_KEY_PATHApp private key, inline or as a .pem path
GH_PROVISION_TOKENClassic PAT for the cross-org customer fork (creation only)
AURORA_TENANTS_ENABLEDEnables the registry BFF + tenants UI repivot
CUSTOMER_SERVICE_URL, TENANT_SERVICE_URL, SOLUTION_SERVICE_URLRegistry service endpoints
SERVICE_SHARED_SECRETOptional shared secret forwarded to the registry services
ORBIT_DEPLOY_ENABLEDEnables the Azure auto-deploy path
AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_ACR_NAME, AZURE_ACA_ENV, AZURE_STORAGE_ACCOUNTAzure coordinates for the deploy path
CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_NAME, CLOUDFLARE_PROXIED, CLOUDFLARE_SSL_MODELets the engine drive the per-solution front door in-product (both token and zone required; otherwise stacks serve on raw FQDNs)
EDGE_LOCK_SOLUTIONS (+ EDGE_HEADER_NAME, EDGE_SHARED_SECRET)Extends the Cloudflare-only lockdown to each solution’s two public apps
AGENT_GIT_TOKEN, AGENT_INTERNAL_GIT_TOKENPer-solution agent push tokens (fall back to GH_PROVISION_TOKEN)
AGENT_ANTHROPIC_API_KEY or AGENT_CLAUDE_CODE_OAUTH_TOKENPer-solution agent Claude credential (falls back to Aurora’s own)
AGENT_DEFAULT_MODELModel override for provisioned solution agents (never rendered blank)
ORBIT_WEBAPP_IMAGE_TAG, ORBIT_AGENT_IMAGE_TAG, ORBIT_AGENT_WEBAPP_IMAGE_TAGThe image tags per-solution stacks pin (validated against ACR by the preflight step)
AURORA_AGENT_ENABLEDEnables the agent chat panel
ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKENClaude credential (either mode)
AURORA_AGENT_CWDWorking directory for the agent’s read-only Read/Grep/Glob tools
AURORA_AGENT_MODELOptional model override for the chat agent

Locally, Aurora runs in the shared docker-compose stack (see Local development); in production it is the singleton aurora Container App behind a Cloudflare front door (see Azure deployment).