Aurora
Aurora (implementation/aurora-webapp) is the management application above
the Trajectory solutions: the portfolio / control-plane layer that manages,
monitors, and orchestrates the whole fleet of Orbit solutions across customers
and tenants. Where each per-solution Orbit stack (Orbit Webapp,
Orbit Agent, Mission Control)
is scoped to one solution at a time, Aurora operates across many:
- a live GitHub data layer that discovers every solution in the fleet,
- the portfolio registry BFF over the Customer → Tenant → Solution registry services,
- solution creation (forking the two template repos and seeding
solution.yaml), - the Azure auto-deploy trigger that stands up a per-solution Orbit stack on Azure Container Apps, and
- an in-product agent chat panel built on the Claude Agent SDK.
A key structural difference from the Orbit apps: Aurora has no solution repo mount at all. Orbit loads one solution tree from disk; Aurora reads everything live — GitHub for solution definitions, the registry services for portfolio records, and Azure APIs for deployment.
Tech stack
| Concern | Choice |
|---|---|
| Framework | Next.js 16 (App Router), React 19, TypeScript 5 |
| Styling | Tailwind CSS 4 + Orbit’s globals.css design system (copied verbatim, with a marked /* === Aurora additions === */ block) |
| GitHub access | @octokit/rest + @octokit/auth-app (GitHub App JWT → installation token) |
| Azure access | @azure/arm-appcontainers, @azure/arm-storage, @azure/identity (DefaultAzureCredential — managed identity, no service-principal secret) |
| Agent chat | @anthropic-ai/claude-agent-sdk |
| Validation / parsing | zod, yaml |
Dependencies are deliberately trimmed compared to orbit-webapp — no mermaid,
d3, Monaco, xyflow, or MDX pipeline. The shell mirrors Orbit’s (aurora-app.tsx,
rail.tsx, header.tsx, a trimmed icon set) so the control plane and the
per-solution viewers feel like one product family.
src/
app/ layout, / (portfolio or tenants home), /customers, /solutions,
/tenants/[id], plus /api/* route handlers
components/
aurora/ shell, rail, header, portfolio-view, solution-card,
tenants-home, tenant-detail, customers-view, solutions-ops,
chat/ (agent-dock, use-agent-chat)
lib/
portfolio/ github.ts (App auth), loader.ts (two-org discovery),
azure.ts (Azure config + credential)
registry/ client.ts, http.ts, join.ts, write-back.ts, types.ts
solutions/ naming.ts, create-solution.ts, deploy-azure.ts (adapter over
the @maxq/orbit-deploy engine)
agent/ config.ts, run.ts, tools.ts, registry-tools.ts, deploy-tools.tsThe live GitHub data layer
Aurora discovers the fleet by querying two GitHub organisations through
two GitHub Apps — one App per org — in src/lib/portfolio/. Each App
authenticates with @octokit/auth-app (App JWT exchanged for a cached
installation token) and needs only Contents: read and Metadata: read,
with no webhook.
| Org | Role | What Aurora reads |
|---|---|---|
maxq-labs-orbit-solutions (customer) | Source of truth | Lists solution-* repos (excluding solution-template) and reads each repo’s workspace/solution-definition/solution.yaml via the Contents API — one portfolio entry per solution |
maxq-labs-orbit-platform (internal) | Correlation only | Lists solution-*-internal repos (excluding solution-template-internal) to mark whether each solution’s internal counterpart exists (“internal repo linked” indicator). The internal repo has no solution.yaml of its own |
The customer repo’s solution.yaml is the authoritative metadata for each
solution (id, name, owners, description, per-stage status). loader.ts applies
a 60-second module-level TTL cache, and the pages are force-dynamic, so a
burst of renders triggers a single GitHub fetch and data is never more than a
minute stale.
Configuration
Each org is configured through environment variables (values live in a
git-ignored .env; .env.example documents the setup): per role
(CUSTOMER or INTERNAL) — GH_<ROLE>_ORG, GH_<ROLE>_APP_ID,
GH_<ROLE>_INSTALLATION_ID (optional; auto-discovered via the App’s org
installation), and the private key either inline (GH_<ROLE>_PRIVATE_KEY) or
as a path (GH_<ROLE>_PRIVATE_KEY_PATH) to a .pem file under the git-ignored
secrets/ directory. readOrgConfig() returns null when an org is
unconfigured, in which case the app renders a “connect Aurora to GitHub” empty
state instead of crashing.
Private keys are never stored in any .env or committed anywhere — they are
.pem files mounted read-only (locally at /secrets) or injected inline
through the environment / ACA secrets.
The registry BFF
Aurora is the backend-for-frontend of the portfolio registry: browsers and
downstream consumers never talk to the three registry services directly.
src/lib/registry/client.ts is the only module that calls them (service URLs
from CUSTOMER_SERVICE_URL / TENANT_SERVICE_URL / SOLUTION_SERVICE_URL),
forwarding an x-actor header, an optional x-service-secret, and If-Match
versions for optimistic concurrency.
The feature gate is registryEnabled():
AURORA_TENANTS_ENABLED === "true" and all three service URLs configured.
With the flag off, Aurora behaves exactly like the original portfolio app.
The join and the sweep
Registry records and GitHub repos are correlated in src/lib/registry/join.ts:
- Join key = the customer-repo name (not the solution id). Each registry
solution record carries
repos.customer.name; the join looks that name up in the portfolio org-scan snapshot. This tolerates legacy drift — a registry id may differ from what an oldersolution.yamlsays itsidis. Authority is split per field: the registry record owns portfolio concerns (tenant, deployment);solution.yamlowns the methodology definition. - The reconciliation sweep compares the GitHub org scan with the registry
and reports two healable drift states: unregistered repo pairs (on
GitHub, no record) and orphaned records (record exists, repo gone).
Nothing silently disappears — both lists surface in the operator UI and via
GET /api/solutions?sweep=true.
API routes
All handlers are nodejs runtime, force-dynamic, and return 503 with a
pointer to the flag when the registry is disabled:
| Route | Methods | Purpose |
|---|---|---|
/api/customers | GET, POST | List (optional ?status=) / create customers |
/api/customers/[id] | GET, PATCH, DELETE | Read / update / delete one customer |
/api/tenants | GET, POST | List (?customer=, ?status=) / create tenants |
/api/tenants/[id] | GET, PATCH, DELETE | Read / update / delete one tenant |
/api/tenants/[id]/solutions | GET, POST | The tenant’s joined solution list (the contract the tenant-scoped Orbit reader consumes); POST { solutionId } attaches a solution — refused with 409 if it is already attached to another tenant |
/api/tenants/[id]/solutions/[solutionId] | DELETE | Detach (clears the record’s tenant pointer; the record becomes unassigned) |
/api/solutions | GET, POST | Joined records with filters (?tenant=, ?customer=, ?status=, ?unassigned=); ?sweep=true adds the unregistered/orphaned drift sections; POST registers a record (creation flow and backfill) |
/api/solutions/[id] | GET, PATCH | Read / correct one record (deployment corrections go through PATCH) |
/api/agent | POST | The agent chat SSE endpoint (see below) |
UI surface
Aurora renders in one of two modes, selected by the registry gate.
Flag off — the portfolio view
The original single-page app: a hero with roll-up mini-cards and a .sol-grid
of solution cards, one per customer repo. Each card shows a status chip, a
per-stage As-Is / R2B / To-Be strip, links to both repos, and the
internal-repo-linked indicator.
Flag on — the tenants repivot (AURORA_TENANTS_ENABLED)
With the registry enabled, the UI repivots from a flat solution grid to the
Customer → Tenant → Solution model, using real routes instead of a
client-side view switch. The shared chrome (AuroraShell) keeps the same rail,
header, and agent dock, with rail links to Overview (/), Customers
(/customers), and Solutions (/solutions):
| Route | View |
|---|---|
/ | Tenants home — customers, tenants, joined solutions, and the drift sweep in one overview; falls back to a “registry unreachable” surface if the services do not respond |
/customers | Customer list with per-customer tenant counts |
/tenants/[id] | Tenant detail — the tenant’s joined solutions plus attach candidates (only unassigned solutions of the same customer are offered, enforcing the cross-customer invariant) |
/solutions | The demoted all-solutions operator view: the old portfolio grid extended with registry columns (tenant, deployment state) plus the sweep’s attach/register surfaces |
Solution creation
Aurora can stand up a brand-new Trajectory solution. A solution is two GitHub repos, forked from the base templates (both templates live in the platform org):
| Repo | Forked from | Into | Name |
|---|---|---|---|
| Customer repo | solution-template | maxq-labs-orbit-solutions | solution-<org>-<sol> (private) |
| Internal repo | solution-template-internal | maxq-labs-orbit-platform | solution-<org>-<sol>-internal (private) |
Key mechanics (in src/lib/solutions/create-solution.ts and naming.ts):
- Native octokit, not subprocess. The original
create-solutionskill’s bash worker was ported to a server-side service — the app never shells out togh,git, orpython. Seedingsolution.yamluses the Contents API (get → line-replace the four identity keys, preserving comments → update), not clone/commit/push, so the container needs no CLI tooling. - True fork with upstream link. Repos are real GitHub forks (renamed on fork), deliberately keeping the “forked from” link so each solution can pull future template and methodology updates via GitHub’s Sync fork.
- Naming rules. Inputs are an organization slug and a solution slug: each
letters and hyphens only, at most 20 characters, no leading/trailing hyphen,
auto-lowercased.
solution.yamlis seeded withid=<org>-<sol>,name=<sol>,owners=["<org>"], and the optional description. - Preview before create.
previewSolutionvalidates the names and checks GitHub that neither target repo exists — a dry run that makes no changes. - Cross-org fork credential. GitHub App installation tokens are per-org and
cannot fork a private repo across orgs (proven empirically with a spike
script,
scripts/spike-cross-org-fork.mjs, including the App-installed-on-both-orgs variant). The customer fork therefore uses a classic PAT from an account that is a member of both orgs, supplied asGH_PROVISION_TOKENand used only for that one fork; the App tokens still handle uniqueness checks, the same-org internal fork, and seeding. Fine-grained PATs are single-org, so the token must be classic.previewSolutionreportsprovisioningReadyand the agent warns when the token is unset. - Registry integration. With the registry enabled, creation requires a
tenant (validated against the tenant service before any fork is created;
a tenant belonging to a different customer is refused), and the solution
record is registered after seeding. Registration failure is non-fatal — the
result carries a
registerErrorand theregister_solutiontool can heal it later.
Azure auto-deploy (summary)
When ORBIT_DEPLOY_ENABLED=true, createSolution triggers
deployOrbitStack() after seeding. Since 2026-07-05 that function is a thin
adapter (src/lib/solutions/deploy-azure.ts) over the shared
@maxq/orbit-deploy engine (implementation/shared/orbit-deploy — see
ADR-015): it builds the engine config
from Aurora’s environment, runs the deploy scenario with
DefaultAzureCredential (Aurora’s managed identity on ACA), and persists the
structured run log onto the registry record. The scenario provisions, per
solution: an ACR image preflight (pinned tags must exist — building images
stays a human release step), two Azure Files shares (customer + internal repo
clones), the env-storage links, a conditional seed job (skipped when the
shares already hold clones), three Container Apps — the reader
orbit-<h>, the writer agent agent-<h> (internal ingress only since
2026-07-03), and Mission Control mc-<h> — from shared ACR images (h is a
12-hex-character hash of the solution id; naming lives in the engine, with a
parity gate against the bash azure_names), and the Cloudflare front door +
optional edge lockdown for the two public apps (previously bash-only — the
in-product path produced front-door-less stacks). Deploys are idempotent and
reconciling, safe to retry, and available standalone via the confirm-gated
deploy_solution agent tool. On success, the deployment record — state, the
two verified public URLs (no more zone string-guessing; no agentUrl
since the agent went internal), deployedAt, and the per-step lastRun log —
is written back to the registry (non-fatal on failure).
The full provisioning model — steps and scenarios, naming scheme, shares and mounts, Cloudflare subdomains, edge lockdown, scale rules — is covered on the Azure deployment page.
The agent chat panel
Aurora embeds a chat panel (the “agent dock”, mounted in the shell’s reserved agent row) that talks to the Claude Agent SDK and can operate the portfolio conversationally — including creating and deploying solutions.
Transport
POST /api/agent is a Node-runtime, force-dynamic route returning an SSE
stream. The design is stateless: the client sends the full conversation
history each turn, and prior turns are folded into the system prompt — no
server-side session storage, so it works across multiple replicas.
Locked-down toolset
The agent’s capabilities are defined in src/lib/agent/tools.ts,
registry-tools.ts, and deploy-tools.ts as an in-process MCP server
(createSdkMcpServer, server name aurora) — it never shells out:
| Tool | Effect | Gate |
|---|---|---|
preview_solution | Dry-run name validation + uniqueness check | read-only |
create_solution | Fork both repos, seed solution.yaml, register, auto-deploy | confirm-gated |
solution_status | Live Azure snapshot of a stack (app states, image drift vs the pinned tags, front-door DNS, effective URLs) cross-checked against the registry record | read-only |
verify_solution | Run the full step list in verify mode — read-only assertions with remediation hints | read-only |
plan_deploy | Dry-run any scenario: what exists, what apply would change | read-only |
deployment_progress | Read the persisted per-step run log (deployment.lastRun) from the registry | read-only |
deploy_diagnostics | The real failed ARM operations from the Activity Log (ingestion lags ~2–5 min) | read-only |
deploy_solution | The full deploy scenario: preflight → storage → seed → apps → front door → lockdown | confirm-gated |
redeploy_solution_apps | Roll just the apps to the pinned images (optional per-component filter) | confirm-gated |
reconcile_front_door | Just the Cloudflare front door + edge lockdown | confirm-gated |
reconcile_storage | Just shares + links (+ optional reseed) | confirm-gated |
teardown_solution | Destructive stack teardown | confirm + typed-back id (confirmId) |
list_customers, list_tenants, list_solutions | Registry reads (list_solutions includes the drift sweep) | read-only |
create_customer, create_tenant, attach_solution, detach_solution, register_solution | Registry mutations | confirm-gated |
Read, Grep, Glob | Read-only workspace context (cwd from AURORA_AGENT_CWD) | read-only |
The deploy tools resolve a solution by its registry record first (the repos are stored facts) and fall back to explicit org + name when the registry is off. Every deploy tool returns structured step results (status / detail / evidence / remediation as JSON) rather than prose — the recorded failure mode of prose-out tools is the model paraphrasing errors wrongly.
disallowedTools blocks Bash, Write, Edit, NotebookEdit, WebFetch,
and WebSearch; Skill is deliberately not allowed and setting sources are
omitted so the gh-based create-solution skill can never load inside the
panel. maxTurns is capped at 16.
The confirmation gate
Approval is conversational, not an interactive button: a mid-stream
approve/deny UI cannot be resolved statelessly across replicas. The flow is —
the agent previews, shows the resolved plan, asks the user, and only after an
explicit “confirm” calls the tool with confirm: true. Two layers enforce
this:
- Each side-effecting tool rejects any call where
confirm !== true. - The SDK’s
canUseToolcallback is a hard guard (defense in depth): it denies every confirm-gated tool — creation, all deploy scenarios, and all registry mutations — unless the input carriesconfirm: true.
Teardown is double-gated: teardown_solution additionally requires a
confirmId parameter that must exactly equal the solution id — the user has to
literally type the id back in chat (the conversational analogue of
deploy.sh delete’s retype prompt) — and canUseTool enforces that match too.
Enablement and credentials
The panel is off unless AURORA_AGENT_ENABLED=true and a Claude credential
is present. Two credential modes are accepted: a billing API key
(ANTHROPIC_API_KEY) or a Claude subscription token
(CLAUDE_CODE_OAUTH_TOKEN, from claude setup-token) — the two are distinct
env vars because the SDK rejects a subscription token passed as an API key. An
optional AURORA_AGENT_MODEL overrides the model.
Integration map
Configuration reference
| Variable | Purpose |
|---|---|
GH_CUSTOMER_ORG / GH_INTERNAL_ORG | The two GitHub org names |
GH_<ROLE>_APP_ID, GH_<ROLE>_INSTALLATION_ID | GitHub App identity per org (installation id optional — auto-discovered) |
GH_<ROLE>_PRIVATE_KEY or GH_<ROLE>_PRIVATE_KEY_PATH | App private key, inline or as a .pem path |
GH_PROVISION_TOKEN | Classic PAT for the cross-org customer fork (creation only) |
AURORA_TENANTS_ENABLED | Enables the registry BFF + tenants UI repivot |
CUSTOMER_SERVICE_URL, TENANT_SERVICE_URL, SOLUTION_SERVICE_URL | Registry service endpoints |
SERVICE_SHARED_SECRET | Optional shared secret forwarded to the registry services |
ORBIT_DEPLOY_ENABLED | Enables the Azure auto-deploy path |
AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_ACR_NAME, AZURE_ACA_ENV, AZURE_STORAGE_ACCOUNT | Azure coordinates for the deploy path |
CLOUDFLARE_API_TOKEN, CLOUDFLARE_ZONE_NAME, CLOUDFLARE_PROXIED, CLOUDFLARE_SSL_MODE | Lets the engine drive the per-solution front door in-product (both token and zone required; otherwise stacks serve on raw FQDNs) |
EDGE_LOCK_SOLUTIONS (+ EDGE_HEADER_NAME, EDGE_SHARED_SECRET) | Extends the Cloudflare-only lockdown to each solution’s two public apps |
AGENT_GIT_TOKEN, AGENT_INTERNAL_GIT_TOKEN | Per-solution agent push tokens (fall back to GH_PROVISION_TOKEN) |
AGENT_ANTHROPIC_API_KEY or AGENT_CLAUDE_CODE_OAUTH_TOKEN | Per-solution agent Claude credential (falls back to Aurora’s own) |
AGENT_DEFAULT_MODEL | Model override for provisioned solution agents (never rendered blank) |
ORBIT_WEBAPP_IMAGE_TAG, ORBIT_AGENT_IMAGE_TAG, ORBIT_AGENT_WEBAPP_IMAGE_TAG | The image tags per-solution stacks pin (validated against ACR by the preflight step) |
AURORA_AGENT_ENABLED | Enables the agent chat panel |
ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN | Claude credential (either mode) |
AURORA_AGENT_CWD | Working directory for the agent’s read-only Read/Grep/Glob tools |
AURORA_AGENT_MODEL | Optional model override for the chat agent |
Locally, Aurora runs in the shared docker-compose stack (see
Local development); in production it is
the singleton aurora Container App behind a Cloudflare front door (see
Azure deployment).